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agents that recognize and interact with recognition sequences uniquely characteristic of a protein or a set of proteins (Proteome Epi- 
S tope Tags, or PETs) in the sample. Arrays comprising these capture agents or PETs are also provided. 
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Proteome Epitope Tags and Methods of use thereof in protein 
Modification analysis 

Background of the Invention 

Genomic studies are now approaching "industrial" speed and scale, thanks to 

5 advances in gene sequencing and the increasing availability of high-throughput 
methods for studying genes, the proteins they encode, and the pathways in which 
they are involved. The development of DNA microarrays has enabled massively 
parallel studies of gene expression as well as genomic DNA variations. 

DNA microarrays have shown promise in advanced medical diagnostics. 

10 More specifically, several groups have shown that when the gene expression 
patterns of normal and diseased tissues are compared at the whole genome level, 
patterns of expression characteristic of the particular disease state can be observed. 
Bittner et al., (2000) Nature 406:536-540; Clark et al., (2000) Nature 406:532-535; 
Huang et al., (2001) Science 294:870-875; and Hughes et al., (2000) Cell 102:109- 

15 126. For example, tissue samples from patients with malignant forms of prostate 
cancer display a recognizably different pattern of mRNA expression to tissue 
samples from patients with a milder form of the disease. C.f., Dhanasekaran et al., 
(2001) Nature 412 (2001), pp. 822-826. 

However, as James Watson pointed out recently proteins are really the 

20 "actors in biology" ( "A Cast of Thousands " Nature Biotechnology March 2003). A 
more attractive approach would be to monitor key proteins directly. These might be 
biomarkers identified by DNA microarray analysis. In this case, the assay required 
might be relatively simple, examining only 5-10 proteins. Another approach would 
be to use an assay that detects hundreds or thousands of protein features, such as for 

25 the direct analysis of blood, sputum or urine samples, etc. It is reasonable to believe 
that the body would react in a specific way to a particular disease state and produce 
a distinct "biosignature" in a complex data set, such as the levels of 500 proteins in 
the blood. One could imagine that in the future a single blood test could be used to 
diagnose most conditions. 

30 The motivation for the development of large-scale protein detection assays as 
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basic research tools is different to that for their development for medical diagnostics. 
The utility of biosignatures is one aspect researchers desire in order to understand 
the molecular basis of cellular response to a particular genetic, physiological or 
environmental stimulus. DNA microarrays do a good job in this role, but detection 
5 of proteins would allow for more accurate determination of protein levels and, more 
importantly, could be designed to quantitate the presence of different splice variants 
or isoforms. These events, to which DNA microarrays are largely or completely 
blind, often have pronounced effects on protein activities. 

This has sparked great interest in the development of devices such as protein- 
10 detecting microarrays (PDMs) to allow similar experiments to be done at the protein 
level, particularly in the development of devices capable of monitoring the levels of 
hundreds or thousands of proteins simultaneously. 

Prior to the present invention, PDMs that even approach the complexity of 
DNA microarrays do not exist. There are several problems with the current 

1 5 approaches to massively parallel, e.g., cell-wide or proteome wide, protein detection. 
First, reagent generation is difficult: One needs to first isolate every individual target 
protein in order to isolate a detection agent against every protein in an organism and 
then develop detection agents against the purified protein. Since the number of 
proteins in the human organism is currently estimated to be about 30,000 this 

20 requires a lot of time (years) and resources. Furthermore, detection agents against 
native proteins have less defined specificity since it is a difficult task to know which 
part of the proteins the detection agents recognize. This problem causes considerable 
cross-reactivity of when multiple detection agents are arrayed together, making 
large-scale protein detection array difficult to construct. Second, current methods 

25 achieve poor coverage of all possible proteins in an organism. These methods 
typically include only the soluble proteins in biological samples. They often fail to 
distinguish splice variants, which are now appreciated as being ubiquitous. They 
exclude a large number of proteins that are bound in organellar and cellular 
membranes or are insoluble when the sample is processed for detection. Third, 

30 current methods are not general to all proteins or to all types of biological samples. 
Proteins vary quite widely in their chemical character. Groups of proteins require 
different processing conditions in order to keep them stably solubilized for 
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detection. Any one condition may not suit all the proteins. Further, biological 
samples vary in their chemical character. Individual cells considered identical 
express different proteins over the course of their generation and ultimate death. 
Physiological fluids like urine and blood serum are relatively simple, but biopsy 
5 tissue samples are very complex. Different protocols need to be used to process each 
type of sample and achieve maximal solubilization and stabilization of proteins. 

Current detection methods are either not effective over all proteins uniformly 
or cannot be highly multiplexed to enable simultaneous detection of a large number 
of proteins (e.g., > 5,000). Optical detection methods would be most cost effective 
10 but suffer from lack of uniformity over different proteins. Proteins in a sample have 
to be labeled with dye molecules and the different chemical character of proteins 
leads to inconsistency in efficiency of labeling. Labels may also interfere with the 
interactions between the detection agents and the analyte protein leading to further 
errors in quantitation. Non-optical detection methods have been developed but are 
15 quite expensive in instrumentation and are very difficult to multiplex for parallel 
detection of even moderately large samples (e.g., > 100 samples). 

Another problem with current technologies is that they are burdened by 
intracellular life processes involving a complex web of protein complex formation, 
multiple enzymatic reactions altering protein structure, and protein conformational 
20 changes. These processes can mask or expose binding sites known to be present in a 
sample. For example, prostate specific antigen (PSA) is known to exist in serum in 
multiple forms including free (unbound) forms, e.g., pro-PSA, BPSA (BPH- 
associated free PSA), and complexed forms, e.g., PSA-ACT, PSA-A2M (PSA- 
alpharmacroglobulin), and PSA-API (PSA-alpha, -protease inhibitor) (see Stephan 
25 C. et al. (2002) Urology 59:2-8). Similarly, Cyclin E is known to exist not only as a 
full length 50 kD protein, but also in five other low molecular weight forms ranging 
in size from 34 to 49 kD. In fact, the low molecular weight forms of cyclin E are 
believed to be more sensitive markers for breast cancer than the full length protein 
(see Keyomarsi K. et al. (2002) N. Eng. J. Med. 347(20): 1566- 1575). 
30 Sample collection and handling prior to a detection assay may also affect the 

nature of proteins that are present in a sample and, thus, the ability to detect these 
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proteins. As indicated by Evans M. J. et al. (2001) Clinical Biochemistry 34:107-1 12 
and Zhang D. J. et al (1998) Clinical Chemistry 44(6): 1325-1 333, standardizing 
immunoassays is difficult due to the variability in sample handling and protein 
stability in plasma or serum. For example, PSA sample handling, such as sample 
5 freezing, affects the stability and the relative levels of the different forms of PSA in 
the sample (Leinonen J, Stenman UH (2000) Tumour Biol. 21(l):46-53). 

Finally, current technologies are burdened by the presence of autoantibodies 
which affect the outcome of immunoassays in unpredictable ways, e.g., by leading 
to analytical errors (Fitzmaurice T. F. et al. (1998) Clinical Chemistry 44(10):2212- 
10 2214). 

These problems prompted the question whether it is even possible to 
standardize immunoassays for hetergenous protein antigens. (Stenman U-H. (2001) 
Immunoassay Standardization: Is it possible? Who is responsible? Who is capable? 
Clinical Chemistry 47 (5) 815-820). Thus, a great need exists in the art for efficient 

15 and simple methods of parallel detection of proteins that are expressed in a 
biological sample and, particularly, for methods that can overcome the imprecisions 
caused by the complexity of protein chemistry and for methods which can detect all 
or a majority of the proteins expressed in a given cell type at a given time, or for 
proteome-wide detection and quantitation of proteins expressed in biological 

20 samples. 

Summary of the Invention 

The present invention is directed to methods and reagents for reproducible 
protein detection and quantitation, e.g., parallel detection and quantitation, in 
complex biological samples. Salient features to certain embodiments of the present 

25 invention reduce the complexity of reagent generation, achieve greater coverage of 
all protein classes in an organism, greatly simplify the sample processing and 
analyte stabilization process, and enable effective and reliable parallel detection, 
e.g., by optical or other automated detection methods, and quantitation of proteins 
and/or post-translationally modified forms, and, enable multiplexing of standardized 

30 capture agents for proteins with minimal cross-reactivity and well-defined 
specificity for large-scale, proteome-wide protein detection. 
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Embodiments of the present invention also overcome the imprecisions in 
detection methods caused by: the existence of proteins in multiple forms in a sample 
(eg.,, various post-translationally modified forms or various complexed or 
aggregated forms); the variability in sample handling and protein stability in a 
5 sample, such as plasma or serum; and the presence of autoantibodies in samples. In 
certain embodiments, using a targeted fragmentation protocol, the methods of the 
present invention assure that a binding site on a protein of interest, which may have 
been masked due to one of the foregoing reasons, is made available to interact with a 
capture agent. In other embodiments, the sample proteins are subjected to conditions 
10 in which they are denatured, and optionally are alkylated, so as to render buried (or 
otherwise cryptic) PET moieties accessible to solvent and interaction with capture 
agents. As a result/the present invention allows for detection methods having 
increased sensitivity and more accurate protein quantitation capabilities. This 
advantage of the present invention will be particularly useful in, for example, protein. 
15 marker-type disease detection assays (e.g., PSA or Cyclin E based assays) as it will 
allow for an improvement in the predictive value, sensitivity, and reproducibility of 
these assays. The present invention can standardize detection and measurement 
assays for all proteins from all samples. 

For example, a recent study by Punglia et al. (N. Engl. J. Med. 349(4): 335- 
42, July, 2003) indicated that, in the standard PSA-based screening for prostate 
cancer, if the threshold PSA value for undergoing biopsy were set at 4.1 ng per 
milliliter, 82 percent of cancers in younger men and 65 percent of cancers in older 
men would be missed. Thus a lower threshold level of PSA for recommending 
prostate biopsy, particularly in younger men, may improve the clinical value of the 
25 PSA test. However, at lower detection limits, background can become a significant 
issue. It would be immensly advantageous if the sensitivity / selectivity of the assay 
can be improved by, for example, the method of the instant invention. 

In a specific embodiment, the invention provides a method to detect and 
quantitate the presence of specific modified polypeptides in a sample. In a general 
30 sense, the invention provides a method to identify a URS or PET uniquely 
associated with a modification site on a peptide fragment, which PET can then be 
captured and detected / quantitated by specific capture agents. The method applies to 


20 
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virtually all kinds of post-translational modifications, including but are not limited to 
phosphorylation, glycosylation, etc., as long as the modification can be reliably 
detected, for example, by phospho-antibodies. The method also applies to the 
detection of alternative splicing forms of otherwise identical proteins. 

5 The present invention is based, at least in part, on the realization that 

exploitation of unique recognition sequences (URSs) or Proteome Epitope Tags 
(PETs) present within individual proteins can enable reproducible detection and 
quantitation of individual proteins in parallel in a milieu of proteins in a biological 
sample. As a result of this PET-based approach, the methods of the invention detect 
10 specific proteins in a manner that does not require preservation of the whole protein, 
nor even its native tertiary structure, for analysis. Moreover, the methods of the 
invention are suitable for the detection of most or all proteins in a sample, including 
insoluble proteins such as cell membrane bound and organelle membrane bound 
proteins. 

15 The present invention is also based, at least in part, on the realization that 

PETs can serve as Proteome Epitope Tags characteristic of a specific organism's 
proteome and can enable the recognition and detection of a specific organism. 

The present invention is also based, at least in part, on the realization that 
high-affinity agents (such as antibodies) with predefined specificity can be generated 
20 for defined, short length peptides and when antibodies recognize protein or peptide 
epitopes, only 4-6 (on average) amino acids are critical. See, for example, Lerner 
RA (1984) Advances In Immunology . 36:1-45. 

The present invention is also based, at least in part, on the realization that by 
denaturing (including thermo- and/or chemical- denaturation) and/or fragmenting 
25 (such as by protease digestion including digestion by thermo-protease) all proteins in 
a sample to produce a soluble set of protein analytes, e.g., in which even otherwise 
buried PETs including PETs in protein complexes / aggregates are solvent 
accessible, the subject method provides a reproducible and accurate (intra-assay and 
inter-assay) measurement of proteins. 

30 The present invention is also based, at least in part, on the realization that 

protein modifications associated with PETs on a fragmented peptide can be readily 
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detected and quantitated by isolating the associated PET followed by detection / 
quantitation of the modification. 

Accordingly, in one aspect, the present invention provides a method for 
globally detecting the presence of a protein(s) (e.g., membrane bound protein(s)) in 

5 an organism's proteome. The method includes providing a sample which has been 
denatured and/or fragmented to generate a collection of soluble polypeptide 
analytes; contacting the polypeptide analytes with a plurality of capture agents {e.g., 
capture agents immobilized on a solid support such as an array) under conditions 
such that interaction of the capture agents with corresponding unique recognition 

10 sequences occurs, thereby globally detecting the presence of protein(s) in an 
organism's proteome. 

The method is suitable for use in, for example, diagnosis (e.g., clinical 
diagnosis or environmental diagnosis), drug discovery, protein sequencing or protein 
profiling. In one embodiment, at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 

15 90%, 95% or 100% of an organism's proteome is detectable from arrayed capture 
agents. 

The capture agent may be a protein, a peptide, an antibody, e.g., a single 
chain antibody, an artificial protein, an RNA or DNA aptamer, an allosteric 
ribozyme, a small molecule or electronic means of capturing a PET. 

20 The sample to be tested (e.g., a human, yeast, mouse, C. elegans, Drosophila 

melanogaster or Arabidopsis thaliana sample, such whole cell lysate) may be 
fragmented by the use of a proteolytic agent. The proteolytic agent can be any agent, 
which is capable of predictably cleaving polypeptides between specific amino acid 
residues (i.e., the proteolytic cleavage pattern). The predictability of cleavage allows 

25 a computer to generate fragmentation patterns in sillico, which will greatly aid the 
process of searching PETs unique to a sample. 

According to one embodiment of this aspect of the present invention a 
proteolytic agent is a proteolytic enzyme. Examples of proteolytic enzymes, include 
but are not limited to trypsin, calpain, carboxypeptidase, chymotrypsin, V8 protease, 

30 pepsin, papain, subtilisin, thrombin, elastase, gluc-C, endo lys-C or proteinase K, 
caspase-1, caspase-2, caspase-3, caspase-4, caspase-5, caspase-6, caspase-7, 
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caspase-8, MetAP-2, adenovirus protease, HIV protease and the like. 

The following table summarizes the result of analyzing pentamer PETs in the 
human proteome using different proteases. A total of 23,446 sequences are tagged 
before protease digestion. 


5 


Protease 

Cleavage Site 

Fragment Length 

Tagged Proteins 

Chymotrypsin 

after W,F,Y 

12.7 

21,990 

S.A. V-8 E specific 

after E 

13.7 

23,120 

Post-Proline Cleaving 
Enzyme 

after P 

15.7 

23,009 

Trypsin 

after K, R 

8.5 

22,408 


According to another embodiment of this aspect of the present invention a 
proteolytic agent is a proteolytic chemical such as cyanogen bromide and 2-nitro-5- 
thiocyanobenzoate. In still other embodiments, the proteins of the test sample can be 
10 fragmented by physical shearing; by sonication, or some combination of these or 
other treatment steps. 

An important feature for certain embodiments, particularly when analyzing 
complex samples, is to develop a fragmentation protocol that is known to 
reproducibly generate peptides, preferably soluble peptides, which serve as the 

15 unique recognition sequences. The collection of polypeptide analytes generated from 
the fragmentation may be 5-30, 5-20, 5-10, 10-20, 20-30, or 10-30 amino acids long, 
or longer. Ranges intermediate to the above recited values, e.g., 7-15 or 15-25 are 
also intended to be part of this invention. For example, ranges using a combination 
of any of the above recited values as upper and/or lower limits are intended to be 

20 included. 

The unique recognition sequence may be a linear sequence or a non- 
contiguous sequence and may be 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 25, or 30 amino acids in length. In certain embodiments, the unique recognition 
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sequence is selected from the group consisting of SEQ ID NOs: 1-546 or a sub- 
collection thereof. 

In one embodiment, the protein(s) being detected is characteristic of a 
pathogenic organism, e.g., anthrax, small pox, cholera toxin, Staphylococcus aureus 

5 a-toxin, Shiga toxin, cytotoxic necrotizing factor type 1, Escherichia coli heat- 
stable toxin, botulinum toxins, or tetanus neurotoxins. 

In another aspect, the present invention provides a method for detecting the 
presence of a protein, preferably simultaneous or parallel detection of multiple 
proteins, in a sample. The method includes providing a sample which has been 

10 denatured and/or fragmented to generate a collection of soluble polypeptide 
analytes; providing an array comprising a support having a plurality of discrete 
regions to which are bound a plurality of capture agents, wherein each of the capture 
agents is bound to a different discrete region and wherein each of the capture agents 
is able to recognize and interact with a unique recognition sequence within a protein; 

15 contacting the array of capture agents with the polypeptide analytes; and 
determining which discrete regions show specific binding to the sample, thereby 
detecting the presence of a protein in a sample. 

To further illustrate, the present invention provides a packaged protein 
detection array. Such arrays may include an addressable array having a plurality of 

20 features, each feature independently including a discrete type of capture agent that 
selectively interacts with a unique recognition sequence (URS) or PET of an analyte 
protein, e.g., under conditions in which the analyte protein is a soluble protein 
produced by proteolysis and/or denaturation. The features of the array are disposed 
in a pattern or with a label to provide the identity of interactions between analytes 

25 and the capture agents, e.g., to ascertain the identity and/or quantity of a protein 
occurring in the sample. The packaged array may also include instructions for (i) 
contacting the addressable array with a sample containing polypeptide analytes 
produced by denaturation and/or cleavage of proteins at amide backbone positions; 
(ii) detecting interaction of said polypeptide analytes with said capture agent 

30 moieties; (Hi) and determining the identity of polypeptide analytes, or native 
proteins from which they are derived, based on interaction with capture agent 
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moieties. 

In yet a further aspect, the present invention provides a method for detecting 
the presence of a protein in a sample by providing a sample which has been 
denatured and/or fragmented to generate a collection of soluble polypeptide 
5 analytes; contacting the sample with a plurality of capture agents, wherein each of 
the capture agents is able to recognize and interact with a unique recognition 
sequence within a protein, under conditions such that the presence of a protein in the 
sample is detected. 

In another aspect, the present invention provides a method for detecting the 
10 presence of a protein in a sample by providing an array of capture agents comprising 
a support having a plurality of discrete regions (features) to which are bound a 
plurality of capture agents, wherein each of the capture agents is bound to a different 
discrete region and wherein the plurality of capture agents are capable of interacting 
with at least 50% of an organism's proteome; contacting the array with the sample; 
1 5 and determining which discrete regions show specific binding to the sample, thereby 
detecting the presence of a protein in the sample. 

In a further aspect, the present invention provides a method for globally 
detecting the presence of a protein(s) in an organism's proteome by providing a 
sample comprising the protein and contacting the sample with a plurality of capture 
20 agents under conditions such that interaction of the capture agents with 
corresponding unique recognition sequences occurs, thereby globally detecting the 
presence of protein(s) in an organism's proteome. 

In another aspect, the present invention provides a plurality of capture 
agents, wherein the plurality of capture agents are capable of interacting with at least 
25 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or 100% of an organism's 
proteome and wherein each of the capture agents is able to recognize and interact 
with a unique recognition sequence within a protein. 

In yet another aspect, the present invention provides an array of capture 
agents, which includes a support having a plurality of discrete regions to which are 
30 bound a plurality of capture agents (, e.g., at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 
100, 200, 300, 400, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 
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1 0000, 1 1000, 12000 or 13000 different capture agents), wherein each of the capture 
agents is bound to a different discrete region and wherein each of the capture agents 
is able to recognize and interact with a unique recognition sequence within a protein. 
The capture agents may be attached to the support, e.g., via a linker, at a density of 
5 50, 100, 150, 200, 250, 300, 350, 400, 450, 500 or 1000 capture agents/cm 2 . In one 
embodiment, each of the discrete regions is physically separated from each of the 
other discrete regions. 

The capture agent array can be produced on any suitable solid surface, 
including silicon, plastic, glass, polymer, such as cellulose, polyacrylamide, nylon, 
10 polystyrene, polyvinyl chloride or polypropylene, ceramic, photoresist or rubber 
surface. Preferably, the silicon surface is a silicon dioxide or a silicon nitride 
surface. Also preferably, the array is made in a chip format. The solid surfaces may 
be in the form of tubes, beads, discs, silicon chips, microplates, polyvinylidene 
difluoride (PVDF) membrane, nitrocellulose membrane, nylon membrane, other 
15 purous membrane, non-porous membrane, e.g., plastic, polymer, perspex, silicon, 
amongst others, a plurality of polymeric pins, or a plurality of microtitre wells, or 
any other surface suitable for immobilizing proteins and/or conducting an 
immunoassay or other binding assay. 

The capture agent may be a protein, a peptide, an antibody, e.g., a single 
20 chain antibody, an artificial protein, an RNA or DNA aptamer, an allosteric 
ribozyme or a small molecule. 

In a further aspect, the present invention provides a composition comprising 
a plurality of isolated unique recognition sequences, wherein the unique recognition 
sequences are derived from at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 
25 90%, 95% or 100% of an organism's proteome. In one embodiment, each of the 
unique recognition sequences is derived from a different protein. 

In another aspect, the present invention provides a method for preparing an 
array of capture agents. The method includes providing a plurality of isolated unique 
recognition sequences, the plurality of unique recognition sequences derived from at 
30 least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or 100% of an 
organism's proteome; generating a plurality of capture agents capable of binding the 
plurality of unique recognition sequences; and attaching the plurality of capture 
agents to a support having a plurality of discrete regions, wherein each of the capture 
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agents is bound to a different discrete region, thereby preparing an array of capture 
agents. 

In one fundamental aspect, the invention provides an apparatus for detecting 
simultaneously the presence of plural specific proteins in a multi-protein sample, 
5 e.g., a body fluid sample or a cell sample produced by lysing a natural tissue sample 
or microorganism sample. The apparatus comprises a plurality of immobilized 
capture agents for contact with the sample and which include at least a subset of 
agents which respectively bind specifically with individual unique recognition 
sequences, and means for detecting binding events between respective capture 

10 agents and the unique recognition sequences, e.g., probes for detecting the presence 
and/or concentration of unique recognition sequences bound to the capture agents. 
The unique recognition sequences are selected such that the presence of each 
sequence is unambiguously indicative of the presence in the sample (before it is 
fragmented) of a target protein from which it was derived. Each sample is treated 

15 with a set proteolytic protocol so that the unique recognition sequences are 
generated reproducibly. Optionally, the means for detecting binding events may 
include means for detecting data indicative of the amount of bound unique 
recognition sequence. This permits assessment of the relative quantity of at least two 
target proteins in said sample. 

20 The invention also provides methods for simultaneously detecting the 

presence of plural specific proteins in a multi-protein sample. The method comprises 
denaturing and/or fragmenting proteins in a sample using a predetermined protocol 
to generate plural unique recognition sequences, the presence of which in the sample 
are indicative unambiguously of the presence of target proteins from which they 

25 were derived. At least a portion of the Recognition Sequences in the sample are 
contacted with plural capture agents which bind specifically to at least a portion of 
the unique recognition sequences. Detection of binding events to particular unique 
recognition sequences indicate the presence of target proteins corresponding to those 
sequences. 

30 In another aspect, the present invention provides methods for improving the 

reproducibility of protein binding assays conducted on biological samples. The 
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improvement enables detecting the presence of the target protein with greater 
effective sensitivity, or quantitating the protein more reliably (i.e., reducing standard 
deviation). The methods include: (1) treating the sample using a pre-determined 
protocol which A) inhibits masking of the target protein caused by target protein- 

5 protein non covalent or covalent complexation or aggregation, target protein 
degradation or denaturing, target protein post-translational modification, or 
environmentally induced alteration in target protein tertiary structure, and B) 
fragments the target protein to, thereby, produce at least one peptide epitope (i.e., a 
PET) whose concentration is directly proportional to the true concentration of the 

10 target protein in the sample; (2) contacting the so treated sample with a capture 
agent for the PET under suitable binding conditions, and (3) detecting binding 
events qualitatively or quantitatively. 

For certain embodiments of the subject assay, the capture agents that are 
made available according to the teachings herein can be used to develop multiplex 

1 5 assays having increased sensitivity, dynamic range and/or recovery rates relative to, 
for example ELISA and other immunoassays. Such improved performance 
characteristics can include one or more of the following: a regression coefficient 
(R2) of 0.95 or greater for a reference standard, e.g., a comparable control sample, 
more preferably an R2 greater than 0.97, 0.99 or even 0.995; an average recovery 

20 rate of at least 50 percent, and more preferably at least 60, 75, 80 or even 90 percent; 
a average positive predictive value for the occurrence of proteins in a sample of at 
least 90 percent, more preferably at least 95, 98 or even 99 percent; an average 
diagnostic sensitivity (DSN) for the occurrence of proteins in a sample of 99 percent 
or higher, more preferably at least 99.5 or even 99.8 percent; an average diagnostic 

25 specificity (DSP) for the occurrence of proteins in a sample of 99 percent or higher, 
more preferably at least 99.5 or even 99.8 percent. 

Another aspect of the invention provides a method for detecting the presence 
of a post-translational modification on a target protein within a sample, comprising: 
(1 ) computationally analyzing amino acid sequence of said target protein to identify 

30 one or more candidate site for said post-translational modification; (2) 
computationally identifying the amino acid sequence of one or more fragment of 
said target protein, said fragment predictably results from a treatment of said target 
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protein within said sample, and said fragment encompasses said potential post- 
translational modification site and a PET (proteome epitope tag) unique to said 
fragment within said sample; (3) generating a capture agent that specifically binds 
said PET, and immobilizing said capture agent to a support; (4) subjecting said 
5 sample to said treatment to render said fragment soluble in solution, and contacting 
said sample after said treatment to said capture agent; (5) detecting, on said fragment 
bound to said capture agent, the presence or absence of said post-translational 
modification. 

In one embodiment, said post-translational modification is acetylation, 
10 amidation, deamidation, prenylation, formylation, glycosylation, hydroxylation, 
methylation, myristoylation, phosphorylation, ubiquitination, ribosylation or 
sulphation. 

In one embodiment, said post-translational modification is phosphorylation 
on tyrosine, serine or threonine. 
15 In one embodiment, said step of computationally analyzing amino acid 

sequences includes a Nearest-Neighbor Analysis that identifies said PET based on 
criteria that also include one or more of pi, charge, steric, solubility, hydrophobicity, 
polarity and solvent exposed area. 

In one embodiment, the method further comprises determining the specificity 
20 of said capture agent generated in (3) against one or more nearest neighbors), if 
any, of said PET. 

In one embodiment, peptide competition assay is used in determining the 
specificity of said capture agent generated in (3) against said nearest neighbors) of 
said PET. 

25 In one embodiment, said step of computationally analyzing amino acid 

sequences includes a solubility analysis that identifies said PET that are predicted to 
have at least a threshold solubility under a designated solution condition. 

In one embodiment, the length of said PET is selected from 5-10 amino 
acids, 10-15 amino acids, 15-20 amino acids, 20-25 amino acids, 25-30 amino acids, 
30 or 30-40 amino acids. 
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In one embodiment, said capture agent is a full-length antibody, or a 
functional antibody fragment selected from: an Fab fragment, an F(ab , )2 fragment, 
an Fd fragment, an Fv fragment, a dAb fragment, an isolated complementarity 
determining region (CDR), a single chain antibody (scFv), or derivative thereof 
5 In one embodiment, said capture agent is nucleotides; nucleic acids; PNA 

(peptide nucleic acids); proteins; peptides; carbohydrates; artificial polymers; or 
small organic molecules. 

In one embodiment, said capture agent is aptamers, scaffolded peptides, or 
small organic molecules. 

10 In one embodiment, said treatment is denaturation and/or fragmentation of 

said sample by a protease, a chemical agent, physical shearing, or sonication. 

In one embodiment, said denaturation is thermo-denaturation or chemical 
denaturation. 

In one embodiment, said thermo-denaturation is followed by or concurrent 
15 with proteolysis using thermo-stable proteases. 

In one embodiment, said thermo-denaturation comprises two or more cycles 
of thermo-denaturation followed by protease digestion. 

In one embodiment, said fragmentation is carried out by a protease selected 
from trypsin, chymotrypsin, pepsin, papain, carboxypeptidase, calpain, subtilisin, 
20 gluc-C, endo lys-C, or proteinase K. 

In one embodiment, said sample is a body fluid selected from: saliva, 
mucous, sweat, whole blood, serum, urine, amniotic fluid, genital fluid, fecal 
material, marrow, plasma, spinal fluid, pericardial fluid, gastric fluid, abdominal 
fluid, peritoneal fluid, pleural fluid, synovial fluid, cyst fluid, cerebrospinal fluid, 
25 lung lavage fluid, lymphatic fluid, tears, prostatic fluid, extraction from other body 
parts, or secretion from other glands; or from supernatant, whole cell lysate, or cell 
fraction obtained by lysis and fractionation of cellular material, extract or fraction of 
cells obtained directly from a biological entity or cells grown in an artificial 
environment. 

30 In one embodiment, said sample is obtained from human, mouse, rat, frog 
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(Xenopus), fish (zebra fish), fly (Drosophila melanogaster), nematode (C. elegans), 
fission or budding yeast, or plant (Arabidopsis thaliana). 

In one embodiment, said sample is produced by treatment of membrane 
bound proteins. 

5 In one embodiment, said treatment is carried out under conditions to preserve 

said post-translational modification. 

In one embodiment, said PET and said candidate site for said post- 
translational modification do not overlap. 

In one embodiment, said capture agent is optimized for selectivity for said 
10 PET under denaturing conditions. 

In one embodiment, step (5) is effectuated by using a secondary capture 
agent specific for said post-translational modification, wherein said secondary 
capture agent is labeled by a detectable moiety selected from: an enzyme, a 
fluorescent label, a stainable dye, a chemilumninescent compound, a colloidal 
15 particle, a radioactive isotope, a near-infrared dye, a DNA dendrimer, a water- 
soluble quantum dot, a latex bead, a selenium particle, or a europium nanoparticle. 

In one embodiment, said post-translational modification is phosphorylation, 
and said secondary capture agent is a labeled secondary antibody specific for 
phosphorylated tyrosine, phosphorylated serine, or phosphorylated threonine. 

20 In one embodiment, said secondary antibody is labeled by an enzyme or a 

fluorescent group. 

* 

In one embodiment, said enzyme is HRP (horse radish peroxidase). 

In one embodiment, said post-translational modification is phosphorylation, 
and said secondary capture agent is a fluoresent dye that specifically stains 
25 phosphoamino acids. 

In one embodiment, said fluoresent dye is Pro-Q Diamond dye. 

In one embodiment, said post-translational modification is glycosylation, and 
said labeled secondary capture agent is a labeled lectin specific for one or more 
sugar moieties attached to the glycosylation site. 
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In one embodiment, said post-translational modification is ubiquitination, 
and said labeled secondary capture agent is a labeled secondary antibody specific 
for ubiquitin. 

In one embodiment, said sample contains billion molar excess of unrelated 
5 proteins or fragments thereof relative to said fragment. 

In one embodiment, the method further comprises qantitating the amount of 
said fragment bound to said capture agent. 

In one embodiment, step (3) is effectuated by immunizing an animal with an 
antigen comprising said PET sequence. 
1 0 In one embodiment, the N- or C-terminus, or both, of said PET sequence are 

blocked to eliminate free N- or C-terminus, or both. 

In one embodiment, the N- or C-terminus of said PET sequence are blocked 
by fusing the PET sequence to a heterologous carrier polypeptide, or blocked by a 
small chemical group. 
1 5 In one embodiment, said carrier is KLH or BSA. 

Another aspect of the invention provides an array of capture agents for 
identifying all potential substrates of a kinase within a proteome, comprising a 
plurality of capture agents, each immobilized on a distinct addressable location on 
solid support, each of said capture agents specifically binds a PET uniquely 
20 associated with a peptide fragment that predictably results from a treatment of all 
proteins within said proteome, wherein said peptide fragment encompasses one or 
more potential phosphorylation sites of said kinase. 

In one embodiment, said solid support is beads or an array device in a 
manner that encodes the identity of said capture agents disposed thereon. 

25 In one embodiment, said array includes 100 or more different capture agents. 

In one embodiment, said array device includes a diffractive grating surface. 

In one embodiment, said capture agents are antibodies or antigen binding 
portions thereof, and said array is an arrayed ELISA. 

In one embodiment, said array device is a surface plasmon resonance array. 
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In one embodiment, said beads are encoded as a virtual array. 

Another aspect of the invention provides a method of identifying, in a 
sample, potential substrates of a kinase, comprising: (1) computationally analyzing 
amino acid sequences of all proteins in a proteome to identify all candidate 
5 phosphorylation sites for said kinase; (2) computationally identifying all peptide 
fragments encompassing one or more said candidate phosphorylation sites, said 
fragments predictably result from a treatment of all proteins within said proteome; 
(3) for each said fragments identified in (2), identifying one PET unique to said 
fragment within said sample; (4) obtaining capture agents specific for each PET 

10 identified in (3), respectively, and immobilizing said capture agents to generate the 
array of the subject invention; (5) contacting said array of capture agents with a 
sample of said proteome subjected to said treatment, and (6) detecting the presence 
of phosphorylated residues within any fragments bound to said capture agents, if 
any, wherein the presence of phosphorylated residues within a specific fragment 

15 bound to a specific capture agent is indicative that the protein, from which said 
specific fragment is derived from, is a substrate of said kinase. 

In one embodiment, said proteome is a human proteome. 

In one embodiment, said candidate phosphorylation sites are predicted based 
on the consensus sequence of phosphorylation by said kinase. 

20 In one embodiment, said consensus sequence is obtained from a 

phosphorylation site database. 

In one embodiment, said sample is pre-treated by an agent that is a known 
agonist of said kinase, or a known agonist of the signaling pathway to which said 
kinase belongs. 

25 In one embodiment, said treatment is carried out under conditions to preserve 

phosphorylation. 

In one embodiment, the method further comprises verifying phosphorylation 
of said identified substrate by said kinase in vitro or in vivo. 

In one embodiment, said proteome and said kinase are from the same 
30 organism. 
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In one embodiment, step (6) is effectuated by using a labeled secondary 
capture agent specific for phosphorylated residues. 

Another aspect of the invention provides an array of capture agents for 
identifying all potential substrates of an enzyme catalyzing post-translational 

5 modification within a proteome, comprising a plurality of capture agents, each 
immobilized on a distinct addressable location on solid support, each of said capture 
agents specifically binds a PET uniquely associated with a peptide fragment that 
predictably results from a treatment of all proteins within said proteome, wherein 
said peptide fragment encompasses one or more potential post-translational 

1 0 modification sites of said enzyme. 

Another aspect of the invention provides a method of identifying, in a 
sample, potential substrates of an enzyme that catalyze a post-translational 
modification selected from acetylation, amidation, deamidation, prenylation, 
formylation, glycosylation, hydroxylation, methylation, myristoylation, 

15 phosphorylation, ubiquitination, ribosylation or sulphation, comprising: (1) 
computationally analyzing amino acid sequences of all proteins in a proteome to 
identify all candidate post-translational modification sites for said enzyme; (2) 
computationally identifying all peptide fragments encompassing one or more said 
candidate post-translational modification sites, said fragments predictably result 

20 from a treatment of all proteins within said proteome; (3) for each said fragments 
identified in (2), identifying one PET unique to said fragment within said sample; 
(4) obtaining capture agents specific for each PET identified in (3), respectively, and 
immobilizing said capture agents in the array of the subject invention; (5) contacting 
said array of capture agents with a sample of said proteome subjected to said 

25 treatment, and (6) detecting the presence of residues with said post-translational 
modification within any fragments bound to said capture agents, if any, wherein the 
presence of residues with said post-translational modification within a specific 
fragment bound to a specific capture agent is indicative that the protein, from which 
said specific fragment is derived from, is a substrate of said enzyme. 

30 Another aspect of the invention provides an array of capture agents for 

determining which, if any, of a selected number of signal transduction pathways 
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within a proteome is activated or inhibited in response to a stimulation, comprising: 
a plurality of capture agents, each immobilized on a distinct addressable location on 
solid support, each of said capture agents specifically binds a unique PET associated 
with a peptide fragment that predictably results from a treatment of one or more key 
5 proteins of said signal transduction pathways, said peptide fragment encompasses 
one or more sites predictably post-translationally modified upon activation or 
inhibition of said pathway; wherein each of said signal transduction pathways is 
represented by one or more said key proteins. 

In one embodiment, said signal transduction pathways are immune pathways 
10 activated by IL-4, IL-13, or Token-like receptor; seven-transmembrane receptor 
pathways activated by adrenergic, PAC1 receptor, Dictyostelium discoideum cAMP 
chemotaxis, Wnt/Ca 2+ /cGMP, or G Protein-independent seven transmembrane 
receptor; circadian rhythm pathway of murine or Drosophila; insulin pathway; FAS 
pathway; TNF pathway; G-Protein coupled receptor pathways; integrin pathways; 
15 mitogen-activated protein kinase pathways of MAPK, JNK, or p38; estrogen 
receptor pathway; phosphoinositide 3-kinase pathway; Transforming Growth Factor- 
P (TGF-p) pathway; B Cell antigen receptor pathway; Jak-STAT pathway; STAT3 
pathway; T Cell signal transduction pathway; Type 1 Interferon (a/p) pathway; 
jasmonate biochemical pathway; or jasmonate signaling pathway. 

20 In one embodiment, said proteome is that of human, mouse, rat, frog 

(Xenopus), fish (zebra fish), fly (Drosophila melanogaster), nematode (C. elegans), 
fission or budding yeast, or plant (Arabidopsis thaliana). 

In one embodiment, said post-translational modification is phosphorylation 
on a tyrosine, a serine, or a threonine residue. 

25 In one embodiment, said stimulation is treatment of cells by a growth factor, 

a cytokine, a hormone, a steroid, a lipid, an antigen, a small molecule (Ca 2+ , cAMP, 
cGMP), an osmotic shock, a heat or cold shock, a pH change, a change in ionic 
strength, a mechanical force, a viral or bacterial infection, or an attachment or 
detachment from a neighboring cell or a surface with or without a coated protein. 

30 In one embodiment, activation or inhibition of at least one of said signal 

transduction pathways is manifested by a type of post-translational modification 
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different from those of other signal transduction pathways. 

In one embodiment, at least 3, 5, 10, 20, 50, 100, 200, 500, or 1000 signaling 
pathways are represented. 

In one embodiment, signaling pathways of at least two different organisms 
5 are represented. 

In one embodiment, similar signaling pathways of different organisms are 
represented. 

In one embodiment, all capture agents are specific for proteins belonging to 
the same signal transduction pathway, and wherein all proteins of said signal 
10 transduction pathway that are predictably post-translationally modified are 
represented. 

In one embodiment, one or more of said key proteins are post-translationally 
modified upon activation or inhibition of at least two of said signal transduction 
pathways. In this embodiment, the status of post-translational modification of these 
15 key proteins may indicate cross-talk between different, or even seemingly irrelavent, 
signaling pathways, since signals converge to these key proteins from many 
different pathways. 

In one embodiment, the array further includes instructions for: (1) denaturing 
and/or fragmentation of a sample containing polypeptide analytes, in a way 
20 compatible with the array; (2) detecting interaction of said polypeptide analytes or 
fragments thereof with said capture agents. 

In one embodiment, the instructions further includes one or more of: data for 
calibration procedures and preparation procedures, and statistical data on 
performance characteristics of the capture agents. 
25 In one embodiment, the array has a recovery rate of at least 50 percent. 

In one embodiment, the array has an overall positive predictive value for 
occurrence of proteins in said sample of at least 90 percent. 

In one embodiment, the array has an overall diagnostic sensitivity (DSN) for 
occurrence of proteins in said sample of 99 percent or higher. 
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In one embodiment, said array comprises at least 1,000 or 10,000 different 
capture agents bound to said support. 

In one embodiment, said capture agents are bound to said support at a 
density of 100 capture agents /cm 2 . 

5 In one embodiment, the array further includes one or more labeled reference 

peptides including PET portions that bind to said capture agents, wherein said 
binding of said capture agents with said polypeptide analytes is detected by a 
competitive binding assay with said reference peptides. 

In one embodiment, the addressable array is collection of beads, each of 
10 which comprises a discrete species of capture agent and one or more labels which 
identify the bead. 

Another aspect of the invention provides a method of using the array of the 
subject invention for determining which, if any, of a selected number of signal 
transduction pathways within a sample from a proteome is activated or inhibited in 

15 response to a stimulation, comprising: (1) subjecting said sample to said stimulation; 
(2) subjecting said sample to the treatment of the subject invention to render said 
peptide fragment of the subject invention soluble in solution; (3) contacting said 
sample after said treatment to the array of the subject invention; (4) detecting the 
presence, and/or quantitate the amount of post-translationally modified residues 

20 within any fragments bound to said capture agents, if any, wherein a change in the 
presence and/or amount of post-translationally modified residues within a specific 
fragment bound to a specific capture agent on said array, after said stimulation, is 
indicative that the signal transduction pathway represented by said specific fragment 
is activated or inhibited. 

25 In one embodiment, said stimulation is effectuated by a candidate analog of a 

drug, and wherein activation or inhibition of a specific signal transduction pathway 
is monitored. 

In one embodiment, said specific signal transduction pathway is one that is 
affected by said drug. 

30 In one embodiment, the method further comprises comparing the degree of 
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activation / inhibition of said specific signal transduction pathway by said analog 
and said drug. 

In one embodiment, said specific signal transduction pathway is one that 
mediates a side effect of said drug. 

5 Another aspect of the invention provides a business method for a 

biotechnology or pharmaceutical business, the method comprising: (i) identifying, 
using the method of the subject invention, one or more substrates for an enzyme 
catalyzing a post-translational modification; (ii) optionally, verifying the post- 
translational modification of said substrates by said enzyme; (iii) licensing to a third 

10 party the right to manufacture, or explore the use of said substrate as a target of said 
enzyme. 

Another aspect of the invention provides a business method for providing 
protein detection arrays for identifying substrates of a post-translational modification 
enzyme, the method comprising: (i) identifying, within a proteome, one or more 

15 protein(s) or fragments thereof that have at least one site for said potential post- 
translational modification; (ii) identifying one or more PETs for each of one or more 
protein(s) or fragments thereof identified in (i); (iii) generating one or more capture 
agent(s) for each of said PETs identified in (ii), each of said capture agent(s) 
specifically bind one of said PETs for which said capture agent(s) is generated; (iv) 

20 fabricating arrays of capture agent(s) generated in (iii), wherein each of said capture 
agents is bound to a different discrete region or address of said solid support; (v) 
• packaging said arrays of capture agent(s) in (iv) for use in diagnostic and/or research 
experimentation. 

In one embodiment, the business method further comprises marketing said 
25 arrays of capture agent(s). 

In one embodiment, the business method further comprises distributing said 
arrays of capture agent(s). 

Another aspect of the invention provides a composition comprising a 
plurality of capture agents, wherein said plurality of capture agents are, collectively, 
30 capable of specifically interacting with all potential substrates of a post-translational 
modification enzyme within an organism's proteome, and wherein each of said 
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capture agents is able to recognize and interact with only one PET within said 
potential substrate or fragment thereof containing the post-translational modification 
site. 

In one embodiment, said capture agents are selected from the group 
5 consisting of: nucleotides; nucleic acids; PNA (peptide nucleic acids); proteins; 
peptides; carbohydrates; artificial polymers; and small organic molecules. 

In one embodiment, said capture agents are antibodies, or antigen binding 
fragments thereof. 

In one embodiment, said capture agent is a full-length antibody, or a 
10 functional antibody fragment selected from: an Fab fragment, an F(ab')2 fragment, 
an Fd fragment, an Fv fragment, a dAb fragment, an isolated complementarity 
determining region (CDR), a single chain antibody (scFv), or derivative thereof. 

In one embodiment, each of said capture agents is a single chain antibody. 

Another aspect of the invention provides a business method for generating 
1 5 arrays of capture agents for marketing in research and development, the method 
comprising: (1) identifying one or more protein(s), a post-translational modification 
of which protein(s) represent the activation of at least one signal transduction 
pathway within an organism; (2) identifying one or more PETs for each of said 
protein(s), or fragment thereof containing at least one site for said post-translational 
20 modification; (3) generating one or more capture agent(s) for each of said PETs 
identified in (2), each of said capture agent(s) specifically bind one of said PETs for 
which said capture agent(s) is generated; (4) fabricating arrays of capture agent(s) 
generated in (3) on solid support, wherein each of said capture agents is bound to a 
different discrete region of said solid support; (5) packaging said arrays of capture 
25 agent(s) in (4) for diagnosis and/or research use in commercial and/or academic 
laboratories. 

In one embodiment, the business method further comprises marketing said 
arrays of capture agent(s) in (4) or said packaged arrays of capture agent(s) in (5) to 
potential customers and/or distributors. 

30 In one embodiment, the business method further comprises distributing said 
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arrays of capture agent(s) in (4) or said packaged arrays of capture agent(s) in (5) to 
customers and/or distributors. 

Another aspect of the invention provides a business method for generating 
arrays of capture agents for marketing in research and development, the method 
5 comprising: (1) identifying one or more protein(s), a post-translational modification 
of which protein(s) represent the activation of at least one signal transduction 
pathway within an organism; (2) identifying one or more PETs for each of said 
protein(s), or fragment thereof containing at least one site for said post-translational 
modification; (3) licensing to a third party the right to manufacture or use said one or 
10 more PET(s) identified in (2). 

Another aspect of the invention provides a method of immunizing a host 
animal against a disease condition associated with the presence or overexpression of 
a protein, comprising: (1) computationally analyzing the amino acid sequence of 
said protein to identify one or more PET(s) unique to said protein within the 
1 5 proteome of said host animal; (2) administering said one or more PET(s) identified 
in (1 ) to said host animal as an immunogen. 

In one embodiment, said one or more PET(s) is administered to said host 
animal in a formulation designed to enhance the immune response of said host 
animal. 

20 In one embodiment, said formulation comprises liposomes with or without 

additional adjuvants selected from: lipopolysaccharide (LPS), lipid A, muramyl 
dipeptide (MDP), glucan or cytokine. 

In one embodiment, said cytokine is an interleukin, an interferon, or an 
colony stimulating factor. 
25 In one embodiment, said formulation comprises a viral or bacterial vector 

encoding said one or more PET(s). 

In one embodiment, said protein is from an organism different from the host 

animal. 

In one embodiment, said protein is from a tumor cell, an infectious agent or a 
30 parasitic agent. 


-25- 


WO 2005/078453 


PCT/US2005/003634 


In one embodiment, said infectious agent is SARS virus. 

Another aspect of the invention provides a method of generating antibodies 
specific for a marker protein for use in immunohistochemistry, the method 
comprising computationally analyzing the amino acid sequence of said marker 
5 protein to identify one or more PET(s) unique to said marker protein, wherein said 
PET(s) is located on the surface of said marker protein. 

In one embodiment, said PET(s) excludes residues known to form cross- 
links under the fixation condition to be used in immunohistochemistry. 

Another aspect of the invention provides a method for simultaneous 
10 unambiguous detection / quantification of a family of related proteins in a sample, 
comprising: (1) computationally analyzing amino acid sequences for said family of 
related proteins expected to be present in a sample of proteins, and identifying a 
common PET sequence unique to the said family of proteins; (2) generating a 
capture agent that selectively and specifically binds said common PET; (3) 
1 5 contacting said sample with said capture agent identified in (2); and (4) detecting the 
presence and/or measuring the amount of proteins bound to said capture agent, 
thereby simultaneously detecting / quantifying said family of related proteins in said 
sample. 

In one embodiment, said family of related proteins are denatured and 
20 digested by protease or chemical agents prior to step (3). 

In one embodiment, the method further comprises identifying at least one 
PET unique to each member of said family of related proteins to facilitate detection / 
quantification of said each member. 

In one embodiment, said family of related proteins comprises a family of 
25 related kinases or cytokines. 

In one embodiment, said sample is a body fluid selected from: saliva, 
mucous, sweat, whole blood, serum, urine, amniotic fluid, genital fluid, fecal 
material, marrow, plasma, spinal fluid, pericardial fluid, gastric fluid, abdominal 
fluid, peritoneal fluid, pleural fluid, synovial fluid, cyst fluid, cerebrospinal fluid, 
30 lung lavage fluid, lymphatic fluid, tears, prostatitc fluid, extraction from other body 
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parts, or secretion from other glands; or from supernatant, whole cell lysate, or cell 
fraction obtained by lysis and fractionation of cellular material, extract or fraction of 
cells obtained directly from a biological entity or cells grown in an artificial 
environment. 

5 Another aspect of the invention provides a method of processing a sample 

for use in PET-associated detection / quantitation of a target protein therein, the 
method comprising denaturing all proteins of said sample, and/or fragmenting all 
proteins of said sample by a protease, a chemical agent, physical shearing, or 
sonication. 

10 In one embodiment, said denaturation is thermo-denaturation or chemical 

denaturation. 

In one embodiment, said thermo-denaturation is followed by or concurrent 
with proteolysis using thermo-stable proteases. 

In one embodiment, said thermo-denaturation comprises two or more cycles 
1 5 of thermo-denaturation followed by protease digestion. 

In one embodiment, each of said two or more cycles of thermo-denaturation 
is carried out by denaturing at about 90°C followed by protease digestion at about 
50°C. 

In one embodiment, wherein said fragmentation is carried out by a protease 
20 selected from trypsin, chymotrypsin, pepsin, papain, carboxypeptidase, calpain, 
subtilisin, gluc-C, endo lys-C, or proteinase K. 

In one embodiment, said sample is a body fluid selected from: saliva, 
mucous, sweat, whole blood, serum, urine, amniotic fluid, genital fluid, fecal 
material, marrow, plasma, spinal fluid, pericardial fluid, gastric fluid, abdominal 
25 fluid, peritoneal fluid, pleural fluid, synovial fluid, cyst fluid, cerebrospinal fluid, 
lung lavage fluid, lymphatic fluid, tears, prostatitc fluid, extraction from other body 
parts, or secretion from other glands; or from supernatant, whole cell lysate, or cell 
fraction obtained by lysis and fractionation of cellular material, extract or fraction of 
cells obtained directly from a biological entity or cells grown in an artificial 
30 environment. 
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In one embodiment, said target protein forms or tends to form complexes or 
aggregates with other proteins within said sample. 

In one embodiment, said target protein is a TGF-beta protein. 

Another aspect of the invention provides a SARS virus-specific PET amino 
5 acid sequence as listed in Table SARS. 

Another aspect of the invention provides a method of generating antibodies 
specific for a PET sequence, the method comprising: (1) administering to an animal 
a peptide immunogen comprising said PET sequence; (2) screening for antibodies 
specific for said PET sequence using a peptide fragment comprising said PET 
10 sequence, said peptide fragment predictably results from a treatment of a protein 
comprising said PET sequence. 

In one embodiment, said peptide immunogen consists essentially of said PET 
sequence. 

In one embodiment, the N- or C-terminus, or both, of said PET sequence are 
1 5 blocked to eliminate free N- or C-terminus, or both. 

In one embodiment, more than one peptide immunogens, each comprising a 
PET sequence, are adminitered to said animal. 

In one embodiment, said more than one peptide immunogens encompasses 
PET sequences derived from different proteins. 

20 In one embodiment, said peptide immunogen comprises more than one PET 

sequences. 

In one embodiment, said more than one PET sequences are linked by short 
linker sequences. 

In one embodiment, said more than one PET sequences are derived from 
25 different proteins. 

Another aspect of the invention provides a method for achieving high 
sensitivity detection and/or high accuracy quantitation of a target protein in a 
biological sample, comprising: (1) providing two or more different capture agents 
for detecting a target protein in a test sample, which capture agents are provided as 
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an addressable array, and each of which capture agents selectively interacts with a 
peptide epitope tag (PET) of said target protein; (2) contacting said array with a 
solution of polypeptide analytes produced by denaturation and/or cleavage of 
proteins from the test sample; (3) detecting the presence and amount of said target 

5 protein in the sample from the interaction of said polypeptide analytes with each said 
capture agents; (4) quantitating, if present, the amount of the target protein in the 
sample by averaging the results obtained from each said capture agents in (3). 

In one embodiment, each said different capture agents specifically bind a 
different PET of said target protein. 

10 In one embodiment, said different capture agents belong to the same 

category of capture agent. 

In one embodiment, said category of capture agent includes: antibody, non- 
antibody polypeptide, PNA (peptide nucleic acids), scaffolded peptide, 
peptidomimetic compound, polynucleotide, carbohydrates, artificial polymers, 

15 plastibody, chimeric binding agnet derived from low-affinity ligand, and small 
organic molecules. 

In one embodiment, at least two of said different capture agents belong to 
different categorys of capture agent selected from antibody, non-antibody 
polypeptide, PNA (peptide nucleic acids), scaffolded peptide, peptidomimetic 
20 compound, polynucleotide, carbohydrates, artificial polymers, plastibody, chimeric 
binding agnet derived from low-affinity ligand, and small organic molecules. 

In one embodiment, a subset of said capture agents bind to the same PET, 
and wherein each capture agents of said subset belong to different category of 
capture agent selected from: antibody, non-antibody polypeptide, PNA (peptide 
25 nucleic acids), scaffolded peptide, peptidomimetic compound, polynucleotide, 
carbohydrates, artificial polymers, plastibody, chimeric binding agnet derived from 
low-affinity ligand, and small organic molecules. 

In one embodiment, said target protein has two or more different forms 
within said biological sample. 
30 In one embodiment, said different forms include unprocessed / pro-form and 
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processed / mature form. 

In one embodiment, said different forms include different alternative splicing 

forms. 

In one embodiment, said different forms include unmodified and post- 
5 translationally modified form with respect to one or more post-translational 
modification(s). 

In one embodiment, said post-translational modification includes: 
acetylation, amidation, deamidation, prenylation, formylation, glycosylation, 
hydroxylation, methylation, myristoylation, phosphorylation, ubiquitination, 
1 0 ribosylation and sulphation. 

In one embodiment, a subset of said capture agents are specific for PET(s) 
only found in certain forms but not in other forms. 

In one embodiment, the method further comprise determining the percentage 
of one form of said target protein as compared to the total target protein, or ratio of a 
1 5 first form of said target protein to a second form of said target protein. 

In one embodiment, the method further comprises detecting other target 
proteins within said biological sample with capture agents specific for PETs of said 
other target proteins. 

In one embodiment, two or more different capture agents are used for 
20 detecting and/or quantitating at least one of said other target proteins. 

In one embodiment, for each capture agent, the method has a regression 
coefficient (R 2 ) of 0.95 or greater. 

In one embodiment, the array has a recovery rate of at least 50 percent. 
In one embodiment, the accuracy is 90%. 

25 In one embodiment, said sample is a body fluid selected from: saliva, 

mucous, sweat, whole blood, serum, urine, amniotic fluid, genital fluid, fecal 
material, marrow, plasma, spinal fluid, pericardial fluid, gastric fluid, abdominal 
fluid, peritoneal fluid, pleural fluid, synovial fluid, cyst fluid, cerebrospinal fluid, 
lung lavage fluid, lymphatic fluid, tears, prostatitc fluid, extraction from other body 
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parts, or secretion from other glands; or from supernatant, whole cell lysate, or cell 
fraction obtained by lysis and fractionation of cellular material, extract or fraction of 
cells obtained directly from a biological entity or cells grown in an artificial 
environment. 

5 In one embodiment, said sample is obtained from human, mouse, rat, frog 

(Xenopus), fish (zebra fish), fly {Drosophila melanogaster), nematode (C elegans), 
fission or budding yeast, or plant (Arabidopsis thaliana). 

In one embodiment, said sample is produced by treatment of membrane 
bound proteins. 

10 In one embodiment, step (3) is effectuated by directly detecting and 

measuring captured PET-containing polypeptides using mass spectrometry, 
colorimetric resonant reflection using a SWS or SRVD biosensor, surface plasmon 
resonance (SPR), interferometry, gravimetry, ellipsometry, an evanascent wave 
device, resonance light scattering, reflectometry, a fluorescent polymer 

15 superquenching-based bioassay, or arrays of nanosensors comprising nanowires or 
nanotubes. 

In one embodiment, step (3) is effectuated by using secondary capture agents 
specific for captured polypeptide analytes, wherein said secondary capture agent is 
labeled by a detectable moiety selected from: an enzyme, a fluorescent label, a 
20 stainable dye, a chemilumninescent compound, a colloidal particle, a radioactive 
isotope, a near-infrared dye, a DNA dendrimer, a water-soluble quantum dot, a latex 
bead, a selenium particle, or a europium nanoparticle. 

In one embodiment, said secondary capture agent is specific for a post- 
radiational modification. 
25 In one embodiment, said secondary capture agent is a labeled secondary 

antibody specific for phosphorylated tyrosine, phosphorylated serine, or 
phosphorylated threonine. 

In one embodiment, said sample contains billion molar excess of unrelated 
proteins or fragments thereof relative to said target protein. 
30 In one embodiment, said PET is identified based on one or more of the 
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protein sources selected from: sequenced genome or virtually translated proteome, 
virtually translated transcriptome, or mass spectrometry database of tryptic 
fragments. 

In one embodiment, the target protein is a biomarker with a concentration of 
5 about 1-5 pM in said sample. 

In one embodiment, the target protein is a biomarker with relatively samll 
concentration change of no more than 50%, 40%, 30%, 20%, 10%, 5%, or 1% in a 
disease sample. 

Another aspect of the invention provides an array of capture agents for 
10 detecting and quantitating a target protein within a biological sample, comprising a 
plurality of capture agents, each immobilized on a distinct addressable location on 
solid support, each of said capture agents specifically binds a PET uniquely 
associated with a peptide fragment of said target protein that predictably results from 
a treatment of said biological sample. 

15 In one embodiment, said solid support is beads or an array device in a 

manner that encodes the identity of said capture agents disposed thereon. 

In one embodiment, said array includes 2 - 100 or more different capture 

agents. 

In one embodiment, said array device includes a diffractive grating surface. 

20 - In one embodiment, said capture agents are antibodies or antigen binding 

portions thereof, and said array is an arrayed ELISA. 

In one embodiment, said array device is a surface plasmon resonance array. 

In one embodiment, said beads are encoded as a virtual array. 

Another aspect of the invention provides a composition comprising a 
25 plurality of capture agents, wherein each of said capture agents recognizes and 
interacts with one PET of a target protein. 

In one embodiment, said capture agents is independently selected from: 
antibody, non-antibody polypeptide, PNA (peptide nucleic acids), scaffolded 
peptide, peptidomimetic compound, polynucleotide, carbohydrates, artificial 
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polymers, plastibody, chimeric binding agnet derived from low-affinity ligand, and 
small organic molecules. 

In one embodiment, said capture agents are antibodies, or antigen binding 
fragments thereof. 

5 In one embodiment, said capture agent is a full-length antibody, or a 

functional antibody fragment selected from: an Fab fragment, an F(ab') 2 fragment, 
an Fd fragment, an Fv fragment, a dAb fragment, an isolated complementarity 
determining region (CDR), a single chain antibody (scFv), or derivative thereof. 

In one embodiment, each of said capture agents is a single chain antibody. 

10 It is also contemplated that all embodiments of the invention, including those 

specifically described for different aspects of the invention, can be combined with 
any other embodiments of the invention as appropriate. 

Other features and advantages of the invention will be apparent from the 
following detailed description and claims. 

15 Brief Description of the Drawings 

Figure I depicts the sequence of the Interleukin-8 receptor A and the 
pentamer unique recognition sequences (URS) or PETs within this sequence. 

Figure 2 depicts the sequence of the Histamine HI receptor and the pentamer 
unique recognition sequences (URS) or PETs within this sequence that are not 
20 destroyed by trypsin digestion. 

Figure 3 is an alternative format for the parallel detection of PET from a 
complex sample. In this type of "virtual array" each of many different beads 
displays a capture agent directed against a different PET. Each different bead is 
color-coded by covalent linkage of two dyes (dyel and dye2) at a characteristic 
25 ratio. Only two different beads are shown for clarity. Upon application of the 
sample, the capture agent binds a cognate PET, if present in the sample. Then a 
mixture of secondary binding ligands (in this case labeled PET peptides) conjugated 
to a third fluorescent tag is applied to the mixture of beads. The beads can then be 
analyzed using flow cytometry other detection method that can resolve, on a bead- 
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by-bead basis, the ratio of dye 1 and dye2 and thus identify the PET captured on the 
bead, while the fluorescence intensity of dye3 is read to quantitate the amount of 
labeled PET on the bead (which will in inversely reflect the analyte PET level). 

Figure 4 illustrates the result of extraction of intracellular and membrane 
5 proteins. Top Panel: M: Protein Size Marker; H-S: HELA-Supematant; H-P: HELA- 
Pellet; M-S: MOLT4-Supernatant; M-P: MOLT4-Pellet. Bottom panel shows that 
>90% of the proteins are solublized. Briefly, cells were washed in PBS, then 
suspended (5 x 10 6 cells/ml) in a buffer with 0.5% Triton X-100 and homogenized in 
a Dounce homogenizer (30 strokes). The homogenized cells were centrifuged to 
10 separate the soluble portion and the pellet, which were both loaded to the gel. 

Figure 5 illustrates the process for PET-specific antibody generation. 

Figure 6 illustrates a general scheme of sample preparation prior to its use in 
the methods of the instant invention. The left side shows the process for chemical 
denaturation followed by protease digestion, the right side illustrates the preferred 
1 5 thermo-denaturation and fragmentation. Although the most commonly used protease 
trypsin is depicted in this illustration, any other suitable proteases described in the 
instant application may be used. The process is simple, robust & reproducible, and is 
generally applicable to main sample types including serum, cell lysates and tissues. 

Figure 7 provides an illustrative example of serum sample pre-treatment 
20 using either the thermo-denaturation or the chemical denaturation as described in 
Figure 6. 

Figure 8 shows the result of thermo-denaturation and chemical denaturation 
of serum proteins and cell lysates (MOLT4 and Hela cells). 

Figure 9 illustrates the structure of mature TGF-beta dimer, and one complex 
25 form of mature TGF-beta with LAP and LTBP. 

Figure 10 depicts PET-based array for (AKT) kinase substrate identification. 

Figure 11 illustrates a general approach to identify all PETs of a given length 
in an organism with sequenced genome or a sample with known proteome. Although 
in this illustrative figure, the protein sequences are parsed into overlapping peptides 
30 of 4-10 amino acids in length to identify PETs of 4-10 amino acids, the same 
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scheme is to be used for PETs of any other lengths. 

Figure 12 lists the results of searching the whole human proteome (a total of 
29,076 proteins, which correspond to about 12 million 4-10 overlapping peptides) 
for PETs, and the number of PETs identified for each N between 4-10. 
5 Figure 13 shows the resul t of percentage of human proteins that have at least 

one PET(s). 

Figure 14 provides further data resulting from tryptic digest of the human 
proteome. 

Figure 15 illustrates a schematic drawing of fluorescence sandwich 
10 immunoassay for specific capture and quantitation of a targeted peptide in a 
complex peptide mixture, and results of readout fluorescent signal detected by the 
secondary antibody. 

Figure 16 illustrates the sandwich assay used to detect a tagged-human PSA 
protein. 

1 5 Figure 1 7 illustrates the PETs and their nearest neighbors for the detection of 

phospho-peptides in SHIP-2 and ABL. 

Figure 18 illustrates a general approach to use the sandwich assay for 
detecting N proteins with N+l PET-specific antibodies. 

Figure 19 illustrates the common PETs and kinase-specific PETs useful for 
20 the detection of related kinases. 

Figure 20 shows two SARS-specific PETs and their nearest neighbors in 
both the human proteome and the related Coronaviruses. 

Figure 21 shows a design for the PET-based assay for standardized serum 
TGF-beta measurement. 
25 Figure 22 is a schematic drawing showing the general principal of detecting 

PET-associated protein modification using sandwich assay. 

Figure 23 is a schematic diagram of one embodiment of the detection of 
post-translational modification (e.g., phosphorylation or glycosylation). A target 
peptide is digested by a protease, such as Trypsin to yield smaller, PET-containing 
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fragments. One of the fragments (PTP2) also contains at least one modification of 
interest. Once the fragments are isolated by capture agents on a support, the presence 
of phosphorylation can be detected by, for example, HRP-conjugated anti-phospho- 
amino acid antibodies; and the presence of sugar modification can be detected by, 
5 for example, lectin. 

Figure 24 illustrates that PET-specific antibodies are highly specific for the 
PET antigen and do not bind the nearest neighbors of the PET antigen. 

Detailed Description of the Invention 

10 The present invention provides methods, reagents and systems for detecting, 

e.g., globally detecting, the presence of a protein or a panel of proteins, especially 
protein with a specific type of modification (phosphorylation, glycosylation, 
alternative splicing, mutation, etc.) in a sample. In certain embodiments, the method 
may be used to quantitate the level of expression or post-translational modification 

15 of one or more proteins in the sample. The method includes providing a sample 
which has, preferably, been fragmented and/or denatured to generate a collection of 
peptides, and contacting the sample with a plurality of capture agents, wherein each 
of the capture agents is able to recognize and interact with a unique recognition 
sequence (URS) or PET characteristic of a specific protein or modified state. 

20 Through detection and deconvolution of binding data, the presence and/or amount of 
a protein in the sample is determined. 

In the first step, a biological sample is obtained. The biological sample as 
used herein refers to any body sample such as blood (serum or plasma), sputum, 
ascites fluids, pleural effusions, urine, biopsy specimens, isolated cells and/or cell 

25 membrane preparation (see Figure 4). Methods of obtaining tissue biopsies and body 
fluids from mammals are well known in the art. 

Retrieved biological samples can be further solubilized using detergent- 
based or detergent free (i.e., sonication) methods, depending on the biological 
specimen and the nature of the examined polypeptide (i.e., secreted, membrane 

30 anchored or intracellular soluble polypeptide). 
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In certain embodiment, the sample may be denatured by detergent-free 
methods, such as thermo-denaturation. This is especially useful in applications 
where detergent needs to be removed or is preferably removed in future analysis. 

In certain embodiments, the solubilized biological sample is contacted with 

5 one or more proteolytic agents. Digestion is effected under effective conditions and 
for a period of time sufficient to ensure complete digestion of the diagnosed 
polypeptide(s). Agents that are capable of digesting a biological sample under 
moderate conditions in terms of temperature and buffer stringency are preferred. 
Measures are taken not to allow non-specific sample digestion, thus the quantity of 

10 the digesting agent, reaction mixture conditions (i.e., salinity and acidity), digestion 
time and temperature are carefully selected. At the end of incubation time 
proteolytic activity is terminated to avoid non-specific proteolytic activity, which 
may evolve from elongated digestion period, and to avoid further proteolysis of 
other peptide-based molecules (i.e., protein-derived capture agents), which are added 

15 to the mixture in following steps. 

If the sample is thermo-denatured, protease active at high temperatures, such 
as those isolated from thermophilic bacteria, can be used after the denaturation. 

In the next method step the rendered biological sample is contacted with one 
or more capture agents, which are capable of discriminately binding one or more 

20 protein analytes through interaction via PET binding, and the products of such 
binding interactions examined and, as necessary, deconvolved, in order to identify 
and/or quantitate proteins found in the sample. 

The present invention is based, at least in part, on the realization that unique 
recognition sequences (URSs) or PETs, which can be identified by computational 

25 analysis, can characterize individual proteins in a given sample, e.g., identify a 
particular protein from amongst others and/or identify a particular post- 
translationally modified form of a protein. The use of agents that bind PETs can be 
exploitated for the detection and quantitation of individual proteins from a milieu of 
several or many proteins in a biological sample. The subject method can be used to 

30 assess the status of proteins or protein modifications in, for example, bodily fluids, 
cell or tissue samples, cell lystates, cell membranes, etc. In certain embodiments, the 
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method utilizes a set of capture agents which discriminate between splice variants, 
allelic variants and/or point mutations (e.g., altered amino acid sequences arising 
from single nucleotide polymorphisms). 

As a result of the sample preparation, namely denaturation and/or 

5 proteolysis, the subject method can be used to detect specific proteins / 
modifications in a manner that does not require the homogeneity of the target protein 
for analysis and is relatively refractory to small but otherwise significant differences 
between samples. The methods of the invention are suitable for the detection of all 
or any selected subset of all proteins in a sample, including cell membrane bound 

1 0 and organelle membrane bound proteins. 

In certain embodiments, the detection step(s) of the method are not sensitive 
to post-translational modifications of the native protein; while in other embodiments, 
the preparation steps are designed to preserve a post-translational modification of 
interest, and the detection step(s) use a set of capture agents able to discriminate 

15 between modified and unmodified forms of the protein. Exemplary post- 
translational modifications that the subject method can be used to detect and 
quantitate include acetylation, amidation, deamidation, prenylation (such as 
farnesylation or geranylation), formylation, glycosylation, hydroxylation, 
methylation, myristoylation, phosphorylation, ubiquitination, ribosylation and 

20 sulphation. In one specific embodiment, the phosphorylation to be assessed is 
phosphorylation on tyrosine, serine, threonine or histidine residue. In another 
specific embodiment, the addition of a hydrophobic group to be assessed is the 
addition of a fatty acid, e.g., myristate or palmitate, or addition of a glycosyl- 
phosphatidyl inositol anchor. In certain embodiment, the present method can be used 

25 to assess protein modification profile of a particular disease or disorder, such as 
infection, neoplasm (neoplasia), cancer, an immune system disease or disorder, a 
metabolism disease or disorder, a muscle and bone disease or disorder, a nervous 
system disease or disorder, a signal disease or disorder, or a transporter disease or 
disorder. 

30 As used herein, the term "unique recognition sequence," "URS," "Proteome 

Epitope Tag/' or "PET" is intended to mean an amino acid sequence that, when 
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detected in a particular sample, unambiguously indicates that the protein from which 
it was derived is present in the sample. For instance, a PET is selected such that its 
presence in a sample, as indicated by detection of an authentic binding event with a 
capture agent designed to selectively bind with the sequence, necessarily means that 
5 the protein which comprises the sequence is present in the sample. A useful PET 
must present a binding surface that is solvent accessible when a protein mixture is 
denatured and/or fragmented, and must bind with significant specificity to a selected 
capture agent with minimal cross reactivity. A unique recognition sequence is 
present within the protein from which it is derived and in no other protein that may 
10 be present in the sample, cell type, or species under investigation. Moreover, a PET 
will preferably not have any closely related sequence, such as determined by a 
nearest neighbor analysis, among the other proteins that may be present in the 
sample. A PET can be derived from a surface region of a protein, buried regions, 
splice junctions, or post translationally modified regions. 

15 Perhaps the ideal PET is a peptide sequence which is present in only one 

protein in the proteome of a species. But a peptide comprising a PET useful in a 
human sample may in fact be present within the structure of proteins of other 
organisms. A PET useful in an adult cell sample is "unique" to that sample even 
though it may be present in the structure of other different proteins of the same 

20 organism at other times in its life, such as during embryology, or is present in other 
tissues or cell types different from the sample under investigation. A PET may be 
unique even though the same amino acid sequence is present in the sample from a 
different protein provided one or more of its amino acids are derivatized, and a 
binder can be developed which resolves the peptides, 

25 When referring herein to "uniqueness" with respect to a PET, the reference is 

always made in relation to the foregoing. Thus, within the human genome, a PET 
may be an amino acid sequence that is truly unique to the protein from which it is 
derived. Alternatively, it may be unique just to the sample from which it is derived, 
but the same amino acid sequence may be present in, for example, the murine 

30 genome. Likewise, when referring to a sample which may contain proteins from 
multiple different organism, uniqueness refers to the ability to unambiguously 
identify and discriminate between proteins from the different organisms, such as 
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being from a host or from a pathogen. 

Thus, a PET may be present within more than one protein in the species, 
provided it is unique to the sample from which it is derived. For example, a PET 
may be an amino acid sequence that is unique to: a certain cell type, e.g., a liver, 
5 brain, heart, kidney or muscle cell; a certain biological sample, e.g., a plasma, urine, 
amniotic fluid, genital fluid, marrow, spinal fluid, or pericardial fluid sample; a 
certain biological pathway, e.g., a G-protein coupled receptor signaling pathway or a 
tumor necrosis factor (TNF) signaling pathway. 

In this sense, the instant invention provides a method to identify application- 
10 specific PETs, depending on the type of proteins present in a given sample. This 
information may be readily obtained from a variety of sources. For example, when 
the whole genome of an organism is concerned, the sequenced genome provides 
each and every protein sequences that can be encoded by this genome, sometimes 
even including hypothetical proteins. This "virtually translated proteome" obtained 
15 from the sequenced genome is expected to be the most comprehensive in terms of 
representing all proteins in the sample. Alternatively, the type of transcribed mRNA 
species ("virtually translated transcriptome") within a sample may also provide 
useful information as to what type of proteins may be present within the sample. The 
mRNA species present may be identified by DNA microarrays, SNP analysis, or any 
20 other suitable RNA analysis tools available in the art of molecular biology. An 
added advantage of RNA analysis is that it may also provide information such as 
alternative splicing and mutations. Finally, direct protein analysis using techniques 
such as mass spectrometry may help to identify the presence of specific post- 
translation modifications and mutations, which may aid the design of specific PETs 
25 for specific applications. For example, WO 03/001879 A2 describes methods for 
determining the phosphorylaion status or sulfation state of a polypeptide or a cell 
using mass spectrometry, especially ICP-MS. In a related aspect, mass spectrometry, 
when coupled with separation techniques such as 2-D electrophoresis, GC/LC, etc., 
has provide a wealth of information regarding the profile of expressed proteins in 
30 specific samples. 

For instance, plasma, the soluble component of the human blood, is believed 
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to harbor thousands of distinct proteins, which originate from a variety of cells and 
tissues through either active secretion or leakage from blood cells or tissues. The 
dynamic range of plasma protein concentrations comprises at least nine orders of 
magnitude. Proteins involved in coagulation, immune defense, small molecule 

5 transport, and protease inhibition, many of them present in high abundance in this 
body fluid, have been functionally characterized and associated with disease 
processes. Pieper et al. (Proteomics 3: 1345-1364, 2003) fractionated blood serum 
proteins prior to display on two-dimensional electrophoresis (2-DE) gels using 
immunoaffinity chromatography to remove the most abundant serum proteins, 

10 followed by sequential anionexchange and size-exclusion chromatography. Serum 
proteins from 74 fractions were displayed on 2-DE gels. This approach succeeded in 
resolving approximately 3700 distinct protein spots, many of the post-translatibnally 
modified variants of plasma proteins. About 1800 distinct serum protein spots were 
identified by mass spectrometry. They collapsed into 325 distinct proteins, after 

1 5 sequence homology and similarity searches were carried out to eliminate redundant 
protein annotations. Coomassie Brillant Blue G-250 was used to visualize protein 
spots, and several proteins known to be present in serum in < 10 ng/mL 
concentrations were identified such as interleukin-6, cathepsins, and peptide 
hormones. 

20 The above article exemplifies a typical approach for MS-based protein 

profiling study. In a typical such study, proteins from a specific sample are first 
separated using a chosen appropriate method (such as 2-DE). To identify a sepated 
protein, a gel spot or band is cut out, and in-gel tryptic digestion is performed 
thereafter. The gel must be stained with a mass spectrometry-compatible stain, for 

25 example colloidal Coommasie Brilliant Blue R-250 or Farmer's silver stain. The 
tryptic digest is then analyzed by MS such as MALDI-MS. The resulting mass 
spectrum of peptides, the peptide mass fingerprint or PMF, is searched against a 
sequence database. The PMF is compared to the masses of all theoretical tryptic 
peptides generated in silico by the search program. Programs such as Prospector, 

30 Sequest, and MasCot (Matrix Science, Ltd., London, UK) can be used for the 
database searching. For example, MasCot produces a statistically-based Mowse 
score indicates if any matches are significant or not. MS/MS is typically used to 
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increase the likelihood of getting a database match. The PMF only contains the 
masses of the peptides. CID-MS/MS (collision induced dissociation of tandem MS) 
of peptides gives a spectrum of fragment ions that contain information about the 
amino-acid sequence. Adding this information to the peptide mass fingerprint allows 

5 Mascot to increase the statistical significance of a match. It is also possible in some 
cases to identify a protein by submitting only the raw MS/MS spectrum of a single 
peptide, a so-called MS/MS Ion-Search, such is the amount of information contained 
in these spectra. MS/MS of peptides in a PMF can also greatly increase the 
confidence of a protein indentification, sometimes giving very high Mowse scores, 

1 0 especially with spectra from a TOF/TOF™. 

Applied Biosystems 4700 Proteomics Analyzer, a MALDI-TOF/TOF™ 
tandem mass spectrometer, is unrivalled for the identification of proteins from 
tryptic digests, because of its sensitivity and speed. High-speed batch data 
acquisition is coupled to automated database searching using a locally-running copy 

15 of the Mascot search engine. When proteins cannot be identified by peptide mass 
mapping unambiguously, the digest can be further analyzed by a hybrid 
nanospray/ESI-Quadrupole-TOF-MS and MS/MS in a QSTAR mass spectrometer 
(Applied Biosystems Inc., Foster City, CA) for de novo peptide sequencing, 
sequence tag search, and/or MS/MS ion search. The static nanospray MS/MS is 

20 especially useful used when the target protein is not known (database absent). 
Applied Biosystems QSTAR® Pulsar i tandem mass spectrometer with a Dionex 
UltiMate capillary nanoLC system can be used for ES-LC-MS and MDLC (Multi- 
Dimensional Liquid Chromatography) analysis of peptide mixtures. A combination 
of these instruments can also perform MALDI-MS/MS, MDLC-ES-MS/MS, LC- 

25 MALDI, and Gel-C-MS/MS. With the Probot™ micro-fraction collector, HPLC can 
be interfaced with MALDI and spot peptides eluting from the nanoLC directly onto 
a MALDI target plate. This new LC-MALDI workflow for proteomics allows 
maximal potential for detecting proteins in complex mixtures by complementing the 
conventional 2-DE-based approach. For the traditional 2-DE approach, new and 

30 improved instruments, such as the Bio-Rad Protean 6-gel 2-DE apparatus and 
Packard MultiProbe II-EX robotic sample handler, in conjunction with the Applied 
Biosystems 4700 Proteomics Analyzer, allow higher sample throughputs for 
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complete proteome characterisations. 

Studies such as this, using equivalent instruments described above, have 
accumulated a large amount of MS data regarding expressed proteins and their 
specific protease digestion fragments, mostly tryptic fragment, stored in the form of 

5 many MS database. See, for example, MSDB (a non-identical protein sequence 
database maintained by the Proteomics Department at the Hammersmith Campus of 
Imperial College London. MSDB is designed specifically for mass spectrometry 
applications). PET analysis can be done on these tryptic peptides to identify PETs, 
which in turn is used for PET-specific antibody generation. The advantage of this 

10 approach is that it is known for certain that these (tryptic) peptide fragments will be 
generated in the sample of interest. 

PETs identified based on the different methods described above may be 
combined. For example, in certain embodiments of the invention, multiple PETs 
need to be identified for any given target protein. Some of the PETs may be 

15 identified from sequenced genome data, while others may be identified from tryptic 
peptide databases. 

The PET may be found in the native protein from which it is derived as a 
contiguous or as a non-contiguous amino acid sequence. It typically will comprise a 
portion of the sequence of a larger peptide or protein, recognizable by a capture 

20 agent either on the surface of an intact or partially degraded or digested protein, or 
on a fragment of the protein produced by a predetermined fragmentation protocol. 
The PET may be 5, 6, 7, 8, 9, 10, 1 1, 12, 13,14, 15, 16, 17, 18, 19 or 20 amino acid 
residues in length. In a preferred embodiment, the PET is 6, 7, 8, 9 or 10 amino acid 
residues, preferably 8 amino acids in length. 

25 The term "discriminate", as in "capture agents able to discriminate between", 

refers to a relative difference in the binding of a capture agent to its intended protein 
analyte and background binding to other proteins (or compounds) present in the 
sample. In particular, a capture agent can discriminate between two different species 
of proteins (or species of modifications) if the difference in binding constants is such 

30 that a statistically significant difference in binding is produced under the assay 
protocols and detection sensitivities. In preferred embodiments, the capture agent 
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will have a discriminating index (D.I.) of at least 0.5, and even more preferably at 
least 0.1 , 0.001, or even 0.0001, wherein D.I. is defined as Kd(a)/K<j(b), Kd(a) being 
the dissociation constant for the intended analyte, K<j(b) is the dissociation constant 
for any other protein (or modified form as the case may be) present in sample. 

5 As used herein, the term "capture agent" includes any agent which is capable 

of binding to a protein that includes a unique recognition sequence, e.g., with at least 
detectable selectivity. A capture agent is capable of specifically interacting with 
(directly or indirectly), or binding to (directly or indirectly) a unique recognition 
sequence. The capture agent is preferably able to produce a signal that may be 

10 detected. In a preferred embodiment, the capture agent is an antibody or a fragment 
thereof, such as a single chain antibody, or a peptide selected from a displayed 
library. In other embodiments, the capture agent may be an artificial protein, an 
RNA or DNA aptamer, an allosteric ribozyme or a small molecule. In other 
embodiments, the capture agent may allow for electronic (e.g., computer-based or 

15 information-based) recognition of a unique recognition sequence. In one 
embodiment, the capture agent is an agent that is not naturally found in a cell. 

As used herein, the term "globally detecting" includes detecting at least 40% 
of the proteins in the sample. In a preferred embodiment, the term "globally 
detecting" includes detecting at least 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 

20 95% or 100% of the proteins in the sample. Ranges intermediate to the above recited 
values, e.g., 50%-70% or 75%-95%, are also intended to be part of this invention. 
For example, ranges using a combination of any of the above recited values as upper 
and/or lower limits are intended to be included. 

As used herein, the term "proteome" refers to the complete set of chemically 

25 distinct proteins found in an organism. 

As used herein, the term "organism" includes any living organism including 
animals, e.g., avians, insects, mammals such as humans, mice, rats, monkeys, or 
rabbits; microorganisms such as bacteria, yeast, and fungi, e.g., Escherichia coli, 
Campylobacter, Listeria, Legionella, Staphylococcus, Streptococcus, Salmonella, 

30 Bordatella, Pneumococcus, Rhizobium, Chlamydia, Rickettsia, Streptomyces, 
Mycoplasma, Helicobacter pylori, Chlamydia pneumoniae, Coxiella burnetii, 


-44- 


WO 2005/078453 


PCT/US2005/003634 


Bacillus Anthracis, and Neisseria; protozoa, e.g., Trypanosoma brucei; viruses, e.g., 
human immunodeficiency virus, rhinoviruses, rotavirus, influenza virus, Ebola virus, 
simian immunodeficiency virus, feline leukemia virus, respiratory syncytial virus, 
herpesvirus, pox virus, polio virus, parvoviruses, Kaposi's Sarcoma-Associated 

5 Herpesvirus (KSHV), adeno-associated virus (AAV), Sindbis virus, Lassa virus, 
West Nile virus, enteroviruses, such as 23 Coxsackie A viruses, 6 Coxsackie B 
viruses, and 28 echoviruses, Epstein-Barr virus, caliciviruses, astroviruses, and 
Norwalk virus; ftingi, e.g., Rhizopus, neurospora, yeast, or puccinia; tapeworms, 
e.g., Echinococcus granulosus, E. multilocularis, E. vogeli and E. oligarthrus; and 

10 plants, e.g., Arabidopsis thaliana, rice, wheat, maize, tomato, alfalfa, oilseed rape, 
soybean, cotton, sunflower or canola. 

As used herein, "sample" refers to anything which may contain a protein 
analyte. The sample may be a biological sample, such as a biological fluid or a 
biological tissue. Examples of biological fluids include urine, blood, plasma, serum, 

1 5 saliva, semen, stool, sputum, cerebral spinal fluid, tears, mucus, amniotic fluid or the 
like. Biological tissues are aggregates of cells, usually of a particular kind together 
with their intercellular substance that form one of the structural materials of a 
human, animal, plant, bacterial, fungal or viral structure, including connective, 
epithelium, muscle and nerve tissues. Examples of biological tissues also include 

20 organs, tumors, lymph nodes, arteries and individual cell(s). The sample may also be 
a mixture of target protein containing molecules prepared in vitro. 

As used herein, "a comparable control sample" refers to a control sample that 
is only different in one or more defined aspects relative to a test sample, and the 
present methods, kits or arrays are used to identify the effects, if any, of these 

25 defined difference(s) between the test sample and the control sample, e.g., on the 
amounts and types of proteins expressed and/or on the protein modification profile. 
For example, the control biosample can be derived from physiological normal 
conditions and/or can be subjected to different physical, chemical, physiological or 
drug treatments, or can be derived from different biological stages, etc. 

30 "Predictably result from a treatment" means that a peptide fragment can be 

reliably generated by certain treatments, such as site specific protease digestion or 
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chemical fragmentation. Since the digestion sites are quite specific, the peptide 
fragment generated by specific treatments can be reliably predicted in silico. 

A report by MacBeath and Schreiber (Science 289 (2000), pp. 1760-1763) in 
2000 established that proteins could be printed and assayed in a microarray format, 

5 and thereby had a large role in renewing the excitement for the prospect of a protein 
chip. Shortly after this, Snyder and co-workers reported the preparation of a protein 
chip comprising nearly 6000 yeast gene products and used this chip to identify new 
classes of calmodulin- and phospholipid-binding proteins (Zhu et al., Science 293 
(2001), pp. 2101-2105). The proteins were generated by cloning the open reading 

10 frames and overproducing each of the proteins as glutathione-S-transferase-(GST) 
and His-tagged fusions. The fusions were used to facilitate the purification of each 
protein and the His-tagged family were also used in the immobilization of proteins. 
This and other references in the art established that microarrays containing 
thousands of proteins could be prepared and used to discover binding interactions. 

15 They also reported that proteins immobilized by way of the His tag - and therefore 
uniformly oriented at the surface - gave superior signals to proteins randomly 
attached to aldehyde surfaces. 

Related work has addressed the construction of antibody arrays (de Wildt et 
al., Antibody arrays for high-throughput screening of antibody-antigen interactions. 

20 Nat. Biotechnol. 18 (2000), pp. 989-994; Haab, B.B. et al. (2001) Protein 
microarrays for highly parallel detection and quantitation of specific proteins and 
antibodies in complex solutions. Genome Biol. 2, RESEARCH0004. 1- 
RESEARCH0004. 1 3). Specifically, in an early landmark report, de Wildt and 
Tomlinson immobilized phage libraries presenting scFv antibody fragments on filter 

25 paper to select antibodies for specific antigens in complex mixtures (supra). The use 
of arrays for this purpose greatly increased the throughput when evaluating 
antibodies, allowing nearly 20,000 unique clones to be screened in one cycle. Brown 
and co-workers extended this concept to create molecularly defined arrays wherein 
antibodies were directly attached to aldehyde-modified glass. They printed 115 

30 commercially available antibodies and analyzed their interactions with cognate 
antigens with semi-quantitative results (supra). Kingsmore and co-workers used an 
analogous approach to prepare arrays of antibodies recognizing 75 distinct cytokines 
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and, using the rolling-circle amplification strategy (Lizardi efal., Mutation detection 
and single molecule counting using isothermal rolling circle amplification. Nat. 
Genet. 19 (1998), pp. 225-233), could measure cytokines at femtomolar 
concentrations (Schweitzer et ai, Multiplexed protein profiling on microarrays by 
5 rolling-circle amplification. Nat. Biotechnol. 20 (2002), pp. 359-365). 

These examples demonstrate the many important roles that protein chips can 
play, and give evidence for the widespread activity in fabrication of these tools. The 
following subsections describes in further detail about various aspects of the 
invention. 

10 

I. Type of Capture Agents 

In certain preferred embodiments, the capture agents used should be capable 
of selective affinity reactions with PET moieties! Generally, such ineraction will be. 
non-covalent in nature, though the present invention also contemplates the use of 

1 5 capture reagents that become covalently linked to the PET. 

Examples of capture agents which can be used include, but are not limited to: 
nucleotides; nucleic acids including oligonucleotides, double stranded or single 
stranded nucleic acids (linear or circular), nucleic acid aptamers and ribozymes; 
PNA (peptide nucleic acids); proteins, including antibodies (such as monoclonal or 

20 recombinantly engineered antibodies or antibody fragments), T cell receptor and 
MHC complexes, lectins and scaffolded peptides; peptides; other naturally occurring 
polymers such as carbohydrates; artificial polymers, including plastibodies; small 
organic molecules such as drugs, metabolites and natural products; and the like. 

In certain embodiments, the capture agents are immobilized, permanently or 

25 reversibly, on a solid support such as a bead, chip, or slide. When employed to 
analyze a complex mixture of proteins, the immobilized capture agent are arrayed 
and/or otherwise labeled for deconvolution of the binding data to yield identity of 
the capture agent (and therefore of the protein to which it binds) and (optionally) to 
quantitate binding. Alternatively, the capture agents can be provided free in solution 

30 (soluble), and other methods can be used for deconvolving PET binding in parallel. 

In one embodiment, the capture agents are conjugated with a reporter 
molecule such as a fluorescent molecule or an enzyme, and used to detect the 
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presence of bound PET on a substrate (such as a chip or bead), in for example, a 
"sandwich" type assay in which one capture agent is immobilized on a support to 
capture a PET, while a second, labeled capture agent also specific for the captured 
PET may be added to detect /quantitate the captured PET. In this embodiment, the 

5 peptide fragment contains two unique, non-overlapping PETs, one recognized by the 
immobilized the capture agent, the other recognized by the labled detecting capture 
agent. In a related embodiment, one PET unique to the peptide fragment can be used 
in conjunction with a common PET shared among several protein family members. 
The spacial arrangement of these two PET is such that binding by one capture agent 

10 will not substancially affect the binidng by the other capture agent. In addition, the 
length of the peptide fragment is such that it encompasses two PETs properly spaced 
from each other. Preferably, peptide fragments is at least about 15 residues for 
sandwich assay. In other embodiments a labeled-PET peptide is used in a 
competitive binding assay to determine the amount of unlabeled PET (from the 

15 sample) binds to the capture agent. In this embodiment, the peptide fragment need 
only be long enough to encompass one PET, so peptides as short as 5-8 residues 
may be suitable. 

Generally, the sandwich assay tend to be more (e.g., about 10, 100, or 1000 
fold more) sensitive than the competitive binding assay. 

20 An important advantage of the invention is that useful capture agents can be 

identified and/or synthesized even in the absence of a sample of the protein to be 
detected. With the completion of the whole genome in a number of organisms, such 
as human, fly (Drosophila melanogaster) and nematode (C. elegans), PET of a given 
length or combination thereof can be identified for any single given protein in a 

25 certain organism, and capture agents for any of these proteins of interest can then be 
made without ever cloning and expressing the full length protein. 

In addition, the suitability of any PET to serve as an antigen or target of a 
capture agent can be further checked against other available information. For 
example, since amino acid sequence of many proteins can now be inferred from 
30 available genomic data, sequence from the structure of the proteins unique to the 
sample can be determined by computer aided searching, and the location of the 
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peptide in the protein, and whether it will be accessible in the intact protein, can be 
determined. Once a suitable PET peptide is found, it can be synthesized using 
known techniques. With a sample of the PET in hand, an agent that interacts with 
the peptide such as an antibody or peptidic binder, can be raised against it or panned 

5 from a library. In this situation, care must be taken to assure that any chosen 
fragmentation protocol for the sample does not restrict the protein in a way that 
destroys or masks the PET. This can be determined theoretically and/or 
experimentally, and the process can be repeated until the selected PET is reliably 
retrieved by a capture agent(s). 

10 The PET set selected according to the teachings of the present invention can 

be used to generate peptides either through enzymatic cleavage of the protein from 
which they were generated and selection of peptides, or preferably through peptide 
synthesis methods. 

Proteolytically cleaved peptides can be separated by chromatographic or 
15 electrophoretic procedures and purified and renatured via well known prior art 
methods. 

Synthetic peptides can be prepared by classical methods known in the art, for 
example, by using standard solid phase techniques. The standard methods include 
exclusive solid phase synthesis, partial solid phase synthesis methods, fragment 
20 condensation, classical solution synthesis, and even by recombinant DNA 
technology. See, e.g., Merrifield, J. Am. Chem. Soc, 85:2149 (1963), incorporated 
herein by reference. Solid phase peptide synthesis procedures are well known in the 
art and further described by John Morrow Stewart and Janis Dillaha Young, Solid 
Phase Peptide Syntheses (2nd Ed., Pierce Chemical Company, 1984). 

25 Synthetic peptides can be purified by preparative high performance liquid 

chromatography [Creighton T. (1983) Proteins, structures and molecular principles. 
WH Freeman and Co. N.Y.] and the composition of which can be confirmed via 
amino acid sequencing. 

In addition, other additives such as stabilizers, buffers, blockers and the like 

30 may also be provided with the capture agent. 


-49- 


WO 2005/078453 


PCT/US2005/003634 


A. Antibodies 

In one embodiment, the capture agent is an antibody or an antibody-like 
molecule (collectively "antibody"). Thus an antibody useful as capture agent may be 
a full length antibody or a fragment thereof, which includes an "antigen-binding 
5 portion" of an antibody. The term "antigen-binding portion as used herein, refers 
to one or more fragments of an antibody that retain the ability to specifically bind to 
an antigen. It has been shown that the antigen-binding function of an antibody can 
be performed by fragments of a foll-length antibody. Examples of binding fragments 
encompassed within the term "antigen-binding portion" of an antibody include (i) a 

10 Fab fragment, a monovalent fragment consisting of the V L , V H , C L and C H i domains; 
(ii) a F(ab f ) 2 fragment, a bivalent fragment comprising two Fab fragments linked by 
a disulfide bridge at the hinge region; (iii) a Fd fragment consisting of the V H and 
Chi domains; (iv) a Fv fragment consisting of the V L and V H domains of a single 
arm of an antibody, (v) a dAb fragment (Ward et al, (1989) Nature 341:544-546 ), 

15 which consists of a V H domain; and (vi) an isolated complementarity determining 
region (CDR). Furthermore, although the two domains of the Fv fragment, V L and 
V H , are coded for by separate genes, they can be joined, using recombinant methods, 
by a synthetic linker that enables them to be made as a single protein chain in which 
the V L and V H regions pair to form monovalent molecules (known as single chain Fv 

20 (scFv); see, e.g., Bird et al. (1988) Science 242:423-426; and Huston et al (1988) 
Proc. Natl. Acad. ScL USA 85:5879-5883; and Osbourn et al. 1998, Nature 
Biotechnology 16: 778). Such single chain antibodies are also intended to be 
encompassed within the term "antigen-binding portion" of an antibody. Any V H and 
V L sequences of specific scFv can be linked to human immunoglobulin constant 

25 region cDNA or genomic sequences, in order to generate expression vectors 
encoding complete IgG molecules or other isotypes. V H and V L can also^be used in 
the generation of Fab , Fv or other fragments of immunoglobulins using either 
protein chemistry or recombinant DNA technology. Other forms of single chain 
antibodies, such as diabodies are also encompassed. Diabodies are bivalent, 

30 bispecific antibodies in which V H and V L domains are expressed on a single 
polypeptide chain, but using a linker that is too short to allow for pairing between 
the two domains on the same chain, thereby forcing the domains to pair with 
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complementary domains of another chain and creating two antigen binding sites 
(see, eg., Holliger, P., et al (1993) Proc. Natl. Acad. Set USA 90:6444-6448; 
Poljak, R. J., et al (1994) Structure 2:1 121-1 123). 

Still further, an antibody or antigen-binding portion thereof may be part of a 
5 larger immunoadhesion molecule, formed by covalent or noncovalent association of 
the antibody or antibody portion with one or more other proteins or peptides. 
Examples of such immunoadhesion molecules include use of the streptavidin core 
region to make a tetrameric scFv molecule (Kipriyanov, S.M., et al. (1995) Human 
Antibodies and Hybridomas 6:93-101) and use of a cysteine residue, a marker 

10 peptide and a C-terminal polyhistidine tag to make bivalent and biotinylated scFv 
molecules (Kipriyanov, S.M., et al. (1994) Mol. Immunol. 31:1047-1058). Antibody 
portions, such as Fab and F(ab') 2 fragments, can be prepared from whole antibodies 
using conventional techniques, such as papain or pepsin digestion, respectively, of 
whole antibodies. Moreover, antibodies, antibody portions and immunoadhesion 

1 5 molecules can be obtained using standard recombinant DNA techniques. 

Antibodies may be polyclonal or monoclonal. The terms "monoclonal 
antibodies" and "monoclonal antibody composition," as used herein, refer to a 
population of antibody molecules that contain only one species of an antigen binding 
site capable of immunoreacting with a particular epitope of an antigen, whereas the 
20 term "polyclonal antibodies" and "polyclonal antibody composition" refer to a 
population of antibody molecules that contain multiple species of antigen binding 
sites capable of interacting with a particular antigen. A monoclonal antibody 
composition, typically displays a single binding affinity for a particular antigen with 
which it immunoreacts. 

25 Any art-recognized methods can be used to generate an PET-directed 

antibody. For example, a PET (alone or linked to a hapten) can be used to immunize 
a suitable subject, {e.g., rabbit, goat, mouse or other mammal or vertebrate). For 
example, the methods described in U.S. Patent Nos. 5,422,110; 5,837,268; 
5,708,155; 5,723,129;and 5,849,531 (the contents of each of which are incorporated 

30 herein by reference) can be used. The immunogenic preparation can further include 
an adjuvant, such as Freund ! s complete or incomplete adjuvant, or similar 
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immunostimulatory agent. Immunization of a suitable subject with a PET induces a 
polyclonal anti-PET antibody response. The anti-PET antibody titer in the 
immunized subject can be monitored over time by standard techniques, such as with 
an enzyme linked immunosorbent assay (ELISA) using immobilized PET. 

5 The antibody molecules directed against a PET can be isolated from the 

mammal {e.g., from the blood) and further purified by well known techniques, such 
as protein A chromatography to obtain the IgG fraction. At an appropriate time after 
immunization, e.g., when the anti-PET antibody titers are highest, antibody- 
producing cells can be obtained from the subject and used to prepare, e.g., 

10 monoclonal antibodies by standard techniques, such as the hybridoma technique 
originally described by Kohler and Milstein (1975) Nature 256:495-497) (see also, 
Brown et al (1981) J. Immunol 127:539-46; Brown et al (1980) J. Biol. Chem 
.255:4980-83; Yeh et al. (1976) Proc. Natl Acad. Sci. USA 76:2927-31; and Yeh et 
al (1982) Int. J. Cancer 29:269-75), the more recent human B cell hybridoma 

15 technique (Kozbor et al (1983) Immunol Today 4:72), or the EBV-hybridoma 
technique (Cole et al. (1985), Monoclonal Antibodies and Cancer Therapy, Alan R. 
Liss, Inc., pp. 77-96). The technology for producing monoclonal antibody 
hybridomas is well known (see generally R. H. Kenneth, in Monoclonal Antibodies: 
A New Dimension In Biological Analyses, Plenum Publishing Corp., New York, 

20 New York (1980); E. A. Lerner (1981) Yale J. Biol Med. f 54:387-402; M. L. Gefter 
et al. (1977) Somatic Cell Genet. 3:231-36). Briefly, an immortal cell line (typically 
a myeloma) is fused to lymphocytes (typically splenocytes) from a mammal 
immunized with a PET immunogen as described above, and the culture supernatants 
of the resulting hybridoma cells are screened to identify a hybridoma producing a 

25 monoclonal antibody that binds a PET. 

Any of the many well known protocols used for fusing lymphocytes and 
immortalized cell lines can be applied for the purpose of generating an anti-PET 
monoclonal antibody (see, e.g., G. Galfre et al (1977) Nature 266:55052; Gefter et 
al. Somatic Cell Genet., cited supra; Lerner, Yale J. Biol Med., cited supra; 
30 Kenneth, Monoclonal Antibodies, cited supra). Moreover, the ordinarily skilled 
worker will appreciate that there are many variations of such methods which also 
would be useful. Typically, the immortal cell line (e.g., a myeloma cell line) is 
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derived from the same mammalian species as the lymphocytes. For example, murine 
hybridomas can be made by fusing lymphocytes from a mouse immunized with an 
immunogenic preparation of the present invention with an immortalized mouse cell 
line. Preferred immortal cell lines are mouse myeloma cell lines that are sensitive to 

5 culture medium containing hypoxanthine, aminopterin and thymidine ("HAT 
medium"). Any of a number of myeloma cell lines can be used as a fusion partner 
according to standard techniques, e.g., the P3-NSl/l-Ag4-l, P3-x63-Ag8.653 or 
Sp2/0-Agl4 myeloma lines. These myeloma lines are available from ATCC. 
Typically, HAT-sensitive mouse myeloma cells are fused to mouse splenocytes 

10 using polyethylene glycol ("PEG"). Hybridoma cells resulting from the fusion are 
then selected using HAT medium, which kills unftised and unproductively fused 
myeloma cells (unfused splenocytes die after several days because they are not 
transformed). Hybridoma cells producing a monoclonal antibody of the invention 
are detected by screening the hybridoma culture supernatants for antibodies that bind 

1 5 a PET, e.g., using a standard ELISA assay. 

In addition, automated screening of antibody or scaffold libraries against 
arrays of target proteins / PETs will be the most rapid way of developing thousands 
of reagents that can be used for protein expression profiling. Furthermore, 
polyclonal antisera, hybridomas or selection from library systems may also be used 

20 to quickly generate the necessary capture agents. A high-throughput process for 
antibody isolation is described by Hayhurst and Georgiou in Curr Opin Chem Biol 
5(6):683-9, December 2001 (incorporated by reference). 

The PET antigens used for the generation of PET-specific antibodies are 
preferably blocked at either the N- or C-terminal end, most preferably at both ends 

25 (see Figure 5) to generate neutral groups, since antibodies raised against peptides 
with non-neutralized ends may not be functional for the methods of the invention. 
The PET antigens can be most easily synthesized using standard molecular biology 
or chemical methods, for example, with a peptide synthesizer. The terminals can be 
blocked with NH2- or COO- groups as appropriate, or any other blocking agents to 

30 eliminate free ends. In a preferred embodiment, one end (either N- or C-terminus) of 
the PET will be conjugated with a carrier protein such as KLH or BSA to facilitate 
antibody generation. KLH represents Keyhole-limpet hemocyanin, an oxygen 
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carrying copper protein found in the keyhole-limpet (Megathura crenulata), a 
primitive mollusk sea snail. KLH has a complex molecular arrangement and 
contains a diverse antigenic structure and elicits a strong nonspecific immune 
response in host animals. Therefore, when small peptides (which may not be very 
5 immunogenic) are used as immunogens, they are preferably conjugated to KLH or 
other carrier proteins (BSA) for enhanced immune responses in the host animal. The 
resulting antibodies can be affinity purified using a polypeptide corresponding to the 
PET-containing tryptic peptide of interest (see Figure 5). 

Blocking the ends of PET in antibody generation may be advantageous, since 

10 in many (if not most) cases, the selected PETs are contained within larger (tryptic) 
fragments. In these cases, the PET-specific antibodies are required to bind PETs in 
the middle of a peptide fragment. Therefore, blocking both the C- and N-terminus of 
the PETs best simulates the antibody binding of peptide fragments in a digested 
sample. Similarly, if the selected PET sequence happens to be at the N- or C- 

15 terminal end of a target fragment, then only the other end of the immunogen needs 
to be blocked, preferably by a carrier such as KLH or BSA.. 

Figure 24 below shows that PET-specific antibodies are highly specific and 
have high affinity for their respective PET-antigens. 

When generating PET-specific antibodies, preferably monoclonal antibodies, 

20 a peptide immunogen comprising essentially of the target PET sequence may be 
administered to an animal according to standard antibody generation protocol for 
short peptide antigens. In one embodiment, the short peptide antigen may be 
conjugated with a carrier such as KLH. However, when screening for antibodies 
specific for the PET sequence, it is preferred that the parental peptide fragments 

25 containing the PET sequence (such as the fragment resulting from trypsin digestion) 
is used. This ensures that the identified antibodies will be not only specific for the 
original PET sequence, but also able to recognize the PET peptide fragment for 
which the antibody is designed. Optionally, the specificity of the identified antibody 
can be further verified by reacting with the original immunogen such as the end- 

30 blocked PET sequence itself 

In certain embodiments, several different immunogens for different PET 
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sequences may be simultaneously administered to the same animal, so that different 
antibodies may be generated in one animal Obviously, for each immunogen, a 
separate screen would be needed to identify antibodies specific for the immunogen. 

In an alternative embodiment, different PETs may be linked together in a 
5 single, longer immunogen for administration to an animal. The linker sequence can 
be flexible linkers such as GS, GSSSS or repeats thereof (such as three-peats). 

In both embodiments described above, the different immunogens may be 
from the same or different organisms or proteomes. These methods are all potential 
means of reducing costs in antibody generation. An unexpected advantage of using 
10 linked PET sequences as immunogen is that longer immunogens may at certain 
situations produce higher affinity antibodies than those produced using short PET 
sequences. 

(i) PET-Specific Antibody Knowledge Database 

15 The instant invention also provides an antibody knowledge database, which 

provides various important information pertaining to these antibodies. A specific 
subset of the antibodies will be PET-specific antibodies, which are either generated 
de novo based on the criteria set forth in the instant application, or generated by 
others in the prior art, which happens to recognize certain PETs. 

20 Information to be included in the knowledge database can be quite 

comprehensive. Such knowledge may be further classified as public or proprietary. 
Examples of public information may include: target protein name, antibody source, 
catalog number, potential applications, etc. Exemplary proprietary information 
includes parental tryptic fragments in one or more organisms or specific samples, 

25 immunogen peptide sequences and whether or not they are PETs, affinity for the 
target PET, degree of cross-reactivity with other related epitopes (such as the closest 
nearest neighbors), and usefulness for various PET assays. 

To this end, such information about 1000 anti-peptide antibodies are already 
collected / generated in the knowledge database. Among them, about 128 antibodies 
30 are deemed compatible for trypsin digested samples. Certain commercially available 
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antibodies, the immunogen and the PET sequences they happen to contain, and the 
nearest neighbors of these PETs are listed below. 

Commercial Anti-PET Antibodies 


Protein 

PTP (Immunogen/PET underlined ) 

Nearest Neighbors 

Anti-fvrlin F 

TASPTSSVDGGLGALP K 

SASIDGGL* 
SSSSDGGL; 
TGSVDGGA; 
ESSSDGGL 

/\nu piiuopiiu 
SHC (Tyr239) 

FAOMPTTT TV<sT^<5I NT MAADPK 

TSTAST NT • 
ISTSSLNV; 
VSLSSLNL; 
MDTSSLNL 

Anti-ohosoho- 
PP2A (Tyr307) 

EEEADINOLTEEFF.K 

ADLNQLTQ; 
RDINQLSE; 
ADFNQLAE; 
ADINMVTE 

Anti-Cdk8 

ATSOOPPOYSHOTHR 

QEPPQYSH; 
QQQPQFSH; 
QQPPQHSK; 
QQPPQQQH 


5 B. Proteins and peptides 

Other methods for generating the capture agents of the present invention 
include phage-display technology described in, for example, Dower et aL, WO 
91/17271, McCafferty et ai y WO 92/01047, Herzig et aL, US 5,877,218, Winter et 
ai, US 5,871,907, Winter et aL, US 5,858,657, Holliger et al. y US 5,837,242, 
10 Johnson et aL, US 5,733,743 and Hoogenboom et al. y US 5,565,332 (the contents of 
each of which are incorporated by reference). In these methods, libraries of phage 
are produced in which members display different antibodies, antibody binding sites, 
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or peptides on their outer surfaces. Antibodies are usually displayed as Fv or Fab 
fragments. Phage displaying sequences with a desired specificity are selected by 
affinity enrichment to a specific PET. 

Methods such as yeast display and in vitro ribosome display may also be 
5 used to generate the capture agents of the present invention. The foregoing methods 
are described in, for example, Methods in Enzymology Vol 328 -Part C: Protein- 
protein interactions & Genomics and Bradbury A. (2001) Nature Biotechnology 
19:528-529, the contents of each of which are incorporated herein by reference. 

In a related embodiment, proteins or polypeptides may also act as capture 

10 agents of the present invention. These peptide capture agents also specifically bind 
to an given PET, and can be identified, for example, using phage display screening 
against an immobilized PET, or using any other art-recognized methods. Once 
identified, the peptidic capture agents may be prepared by any of the well known 
methods for preparing peptidic sequences. For example, the peptidic capture agents 

15 may be produced in prokaryotic or eukaryotic host cells by expression of 
polynucleotides encoding the particular peptide sequence. Alternatively, such 
peptidic capture agents may be synthesized by chemical methods. Methods for 
expression of heterologous peptides in recombinant hosts, chemical synthesis of 
peptides, and in vitro translation are well known in the art and are described further 

20 in Maniatis et al. y Molecular Cloning: A Laboratory Manual (1989), 2nd Ed., Cold 
Spring Harbor, N.Y.; Berger and Kimmel, Methods in Enzymology, Volume 152, 
Guide to Molecular Cloning Techniques (1987), Academic Press, Inc., San Diego, 
Calif; Merrifield, J. (1969) J. Am. Chem. Soc. 91:501; Chaiken, I. M. (1981) CRC 
Crit. Rev. Biochem. 11:255; Kaiser et al. (1989) Science 243:187; Merrifield, B. 

25 (1986) Science 232:342; Kent, S. B. H. (1988) Ann. Rev. Biochem. 57:957; and 
Offord, R. E. (1980) Semisynthetic Proteins, Wiley Publishing, which are 
incorporated herein in their entirety by reference). 

The peptidic capture agents may also be prepared by any suitable method for 
chemical peptide synthesis, including solution-phase and solid-phase chemical 

30 synthesis. Preferably, the peptides are synthesized on a solid support. Methods for 
chemically synthesizing peptides are well known in the art (see, e.g., Bodansky, M. 
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Principles of Peptide Synthesis, Springer Verlag, Berlin (1993) and Grant, G.A (ed.). 
Synthetic Peptides: A User's Guide, W.H. Freeman and Company, New York 
(1992). Automated peptide synthesizers useful to make the peptidic capture agents 
are commercially available. 

5 

C. Scaffolded peptides 

An alternative approach to generating capture agents for use in the present 
invention makes use of antibodies are scaffolded peptides, e.g., peptides displayed 
on the surface of a protein. The idea is that restricting the degrees of freedom of a 

10 peptide by incorporating it into a surface-exposed protein loop could reduce the 
entropic cost of binding to a target protein, resulting in higher affinity. Thioredoxin, 
fibronectin, avian pancreatic polypeptide (aPP) and albumin, as examples, are small, 
stable proteins with surface loops that will tolerate a great deal of sequence 
variation. To identify scaffolded peptides that selectively bind a target PET, libraries 

15 of chimeric proteins can be generated in which random peptides are used to replace 
the native loop sequence, and through a process of affinity maturation, those which 
selectively bind a PET of interest are identified. 


D. Simple peptides and peptidomimetic compounds 

20 Peptides are also attractive candidates for capture agents because they 

combine advantages of small molecules and proteins. Large, diverse libraries can be 
made either biologically or synthetically, and the "hits" obtained in binding screens 
against PET moieties can be made synthetically in large quantities. 

Peptide-like oligomers (Soth et al. (1997) Curr. Opin. Chem. Biol. 1:120- 
25 129) such as peptoids (Figliozzi et al., (1996) Methods Enzymol. 267:437^47) can 
also be used as capture reagents, and can have certain advantages over peptides. 
They are impervious to proteases and their synthesis can be simpler and cheaper 
than that of peptides, particularly if one considers the use of functionality that is not 
found in the 20 common amino acids. 
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E. Nucleic acids 

In another embodiment, aptamers binding specifically to a PET may also be 
used as capture agents. As used herein, the term "aptamer " e.g., RNA aptamer or 

5 DNA aptamer, includes single-stranded oligonucleotides that bind specifically to a 
target molecule. Aptamers are selected, for example, by employing an in vitro 
evolution protocol called systematic evolution of ligands by exponential enrichment. 
Aptamers bind tightly and specifically to target molecules; most aptamers to proteins 
bind with a Kd (equilibrium dissociation constant) in the range of 1 pM to 1 nM. 

10 Aptamers and methods of preparing them are described in, for example, E.N. Brody 
et al. (1999) Mol. Diagn. 4:381-388, the contents of which are incorporated herein 
by reference. 

In one embodiment, the subject aptamers can be generated using SELEX, a 
method for generating very high affinity receptors that are composed of nucleic 

15 acids instead of proteins. See, for example,. Brody et al. (1999) Mol Diagn. 
4:381-388. SELEX offers a completely in vitro combinatorial chemistry alternative 
to traditional protein-based antibody technology. Similar to phage display, SELEX 
is advantageous in terms of obviating animal hosts, reducing production time and 
labor, and simplifying purification involved in generating specific binding agents to 

20 a particular target PET. 

To further illustrate, SELEX can be performed by synthesizing a random 
oligonucleotide library, e.g., of greater than 20 bases in length, which is flanked by 
known primer sequences. Synthesis of the random region can be achieved by mixing 
all four nucleotides at each position in the sequence. Thus, the diversity of the 

25 random sequence is maximally 4 n , where n is the length of the sequence, minus the 
frequency of palindromes and symmetric sequences. The greater degree of diversity 
conferred by SELEX affords greater opportunity to select for oligonuclotides that 
form 3-dimensional binding sites. Selection of high affinity oligonucleotides is 
achieved by exposing a random SELEX library to an immobilized target PET. 

30 Sequences, which bind readily without washing away, are retained and amplified by 
the PCR, for subsequent rounds of SELEX consisting of alternating affinity 
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selection and PCR amplification of bound nucleic acid sequences. Four to five 
rounds of SELEX are typically sufficient to produce a high affinity set of aptamers. 

Therefore, hundreds to thousands of aptamers can be made in an 
economically feasible fashion. Blood and urine can be analyzed on aptamer chips 

5 that capture and quantitate proteins. SELEX has also been adapted to the use of 5- 
bromo (5-Br) and 5-iodo (5-1) deoxyuridine residues. These halogenated bases can 
be specifically cross-linked to proteins. Selection pressure during in vitro evolution 
can be applied for both binding specificity and specific photo-cross-linkability. 
These are sufficiently independent parameters to allow one reagent, a photo-cross- 

10 linkable aptamer, to substitute for two reagents, the capture antibody and the 
detection antibody, in a typical sandwich array. After a cycle of binding, washing, 
cross-linking, and detergent washing, proteins will be specifically and covalently 
linked to their cognate aptamers. Because no other proteins are present on the chips, 
protein-specific stain will now show a meaningful array of pixels on the chip. 

15 Combined with learning algorithms and retrospective studies, this technique should 
lead to a robust yet simple diagnostic chip. 

In yet another related embodiment, a capture agent may be an allosteric 
ribozyme. The term "allosteric ribozymes," as used herein, includes single-stranded 
oligonucleotides that perform catalysis when triggered with a variety of effectors, 

20 e.g., nucleotides, second messengers, enzyme cofactors, pharmaceutical agents, 
proteins, and oligonucleotides. Allosteric ribozymes and methods for preparing them 
are described in, for example, S. Seetharaman et al (2001) Nature Biotechnol. 19: 
336-341, the contents of which are incorporated herein by reference. According to 
Seetharaman et al, a prototype biosensor array has been assembled from engineered 

25 RNA molecular switches that undergo ribozyme-mediated self-cleavage when 
triggered by specific effectors. Each type of switch is prepared with a 5- 
thiotriphosphate moiety that permits immobilization on gold to form individually 
addressable pixels. The ribozymes comprising each pixel become active only when 
presented with their corresponding effector, such that each type of switch serves as a 

30 specific analyte sensor. An addressed array created with seven different RNA 
switches was used to report the status of targets in complex mixtures containing 
metal ion, enzyme cofactor, metabolite, and drug analytes. The RNA switch array 
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also was used to determine the phenotypes of Escherichia coli strains for adenylate 
cyclase function by detecting naturally produced 3',5- cyclic adenosine 
monophosphate (cAMP) in bacterial culture media. 

5 F. Plastibodies 

In certain embodiments the subject capture agent is a plastibody. The term 
"plastibody" refers to polymers imprinted with selected template molecules. See, for 
example, Bruggemann (2002) Adv Biochem Eng Biotechnol 76:127-63; and Haupt 
et al. (1998) Trends Biotech. 16:468-475. The plastibody principle is based on 
10 molecular imprinting, namely, a recognition site that can be generated by 
stereoregular display of pendant functional groups that are grafted to the sidechains 
of a polymeric chain to thereby mimic the binding site of, for example, an antibody. 

G. Chimeric binding agents derived from two low-affinity ligands 

15 Still another strategy for generating suitable capture agents is to link two or 

more modest-affinity ligands and generate high affinity capture agent. Given the 
appropriate linker, such chimeric compounds can exhibit affinities that approach the 
product of the affinities for the two individual ligands for the PET. To illustrate, a 
collection of compounds is screened at high concentrations for weak interactors of a 

20 target PET. The compounds that do not compete with one another are then identified 
and a library of chimeric compounds is made with linkers of different length. This 
library is then screened for binding to the PET at much lower concentrations to 
identify high affinity binders. Such a technique may also be applied to peptides or 
any other type of modest-affinity PET-binding compound. 

25 

H. Labels for Capture Agents 

The capture agents of the present invention may be modified to enable 
detection using techniques known to one of ordinary skill in the art, such as 
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fluorescent, radioactive, chromatic, optical, and other physical or chemical labels, as 
described herein below. 

/. Miscellaneous 

5 In addition, for any given PET, multiple capture agents belonging to each of 

the above described categories of capture agents may be available. These multiple 
capture agents may have different properties, such as affinity / avidity / specificity 
for the PET. Different affinities are useful in covering the wide dynamic ranges of 
expression which some proteins can exhibit. Depending on specific use, in any given 

10 array of capture agents, different types / amounts of capture agents may be present 
on a single chip / array to achieve optimal overall performance. 

In a preferred embodiment, capture agents are raised against PETs that are 
located on the surface of the protein of interest, e.g., hydrophilic regions. PETs that 
are located on the surface of the protein of interest may be identified using any of 
15 the well known software available in the art. For example, the Naccess program may 
be used. 

Naccess is a program that calculates the accessible area of a molecule from a 
PDB (Protein Data Bank) format file. It can calculate the atomic and residue 
accessibilities for both proteins and nucleic acids. Naccess calculates the atomic 

20 accessible area when a probe is rolled around the Van der Waal's surface of a 
macromolecule. Such three-dimensional co-ordinate sets are available from the PDB 
at the Brookhaven National laboratory. The program uses the Lee & Richards (1971) 
J. Mol Biol, 55, 379-400 method, whereby a probe of given radius is rolled around 
the surface of the molecule, and the path traced out by its center is the accessible 

25 surface. 

The solvent accessibility method described in Boger, J., Emini, E.A. & 
Schmidt, A., Surface probability profile-An heuristic approach to the selection of 
synthetic peptide antigens, Reports on the Sixth International Congress in 
Immunology (Toronto) 1986 p.250 also may be used to identify PETs that are 
30 located on the surface of the protein of interest. The package MOLMOL (Koradi, R. 
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et al. (1996) J. Mol. Graph. 14:51-55) and Eisenhaber's ASC method (Eisenhaber 
and Argos (1993) J. Comput. Chem. 14:1272-1280; Eisenhaber et al. (1995; J. 
Comput. Chem. 16:273-284) may also be used. 

In another embodiment, capture agents are raised that are designed to bind 
5 with peptides generated by digestion of intact proteins rather than with accessible 
peptidic surface regions on the proteins. In this embodiment, it is preferred to 
employ a fragmentation protocol which reproducibly generates all of the PETs in the 
sample under study. 

10 II. Tools Comprising Capture Agents (A rrays, etc.) 

In certain embodiments, to construct arrays, e.g., high-density arrays, of 
capture agents for efficient screening of complex chemical or biological samples or 
large numbers of compounds, the capture agents need to be immobilized onto a solid 
support (e.g., a planar support or a bead). A variety of methods are known in the art 

15 for attaching biological molecules to solid supports. See, generally, Affinity 
Techniques, Enzyme Purification: Part B, Meth. Enz. 34 (ed. W. B. Jakoby and M. 
Wilchek, Acad. Press, N.Y. 1974) and Immobilized Biochemicals and Affinity 
Chromatography, Adv. Exp. Med. Biol. 42 (ed. R. Dunlap, Plenum Press, N.Y. 
1 974). The following are a few considerations when constructing arrays. 

20 

A. Formats and surfaces consideration 

Protein arrays have been designed as a miniaturisation of familiar 
immunoassay methods such as ELISA and dot blotting, often utilizing fluorescent 
readout, and facilitated by robotics and high throughput detection systems to enable 

25 multiple assays to be carried out in parallel. Common physical supports include 
glass slides, silicon, microwells, nitrocellulose or PYDF membranes, and magnetic 
and other microbeads. While microdrops of protein delivered onto planar surfaces 
are widely used, related alternative architectures include CD centrifugation devices 
based on developments in microfluidics [Gyros] and specialized chip designs, such 

30 as engineered microchannels in a plate [The Living Chip™, Biotrove] and tiny 3D 
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posts on a silicon surface [Zyomyx]. Particles in suspension can also be used as the 
basis of arrays, providing they are coded for identification; systems include color 
coding for microbeads [Luminex, Bio-Rad] and semiconductor nanocrystals 
[QDots™, Quantum Dots], and barcoding for beads [UltraPlex™, Smartbeads] and 
5 multimetal microrods [Nanobarcodes™ particles, Surromed]. Beads can also be 
assembled into planar arrays on semiconductor chips [LEAPS technology, BioArray 
Solutions]. 

B. Immobilisation considerations 

10 The variables in immobilization of proteins such as antibodies include both 

the coupling reagent and the nature of the surface being coupled to. Ideally, the 
immobilization method used should be reproducible, applicable to proteins of 
different properties (size, hydrophilic, hydrophobic), amenable to high throughput 
and automation, and compatible with retention of fully functional protein activity. 

15 Orientation of the surface-bound protein is recognized as an important factor in 
presenting it to ligand or substrate in an active state; for capture arrays the most 
efficient binding results are obtained with orientated capture reagents, which 
generally requires site-specific labeling of the protein. 

The properties of a good protein array support surface are that it should be 

20 chemically stable before and after the coupling procedures, allow good spot 
morphology, display minimal nonspecific binding, not contribute a background in 
detection systems, and be compatible with different detection systems. 

Both covalent and noncovalent methods of protein immobilization are used 
and have various pros and cons. Passive adsorption to surfaces is methodologically 

25 simple, but allows little quantitative or orientational control; it may or may not alter 
the functional properties of the protein, and reproducibility and efficiency are 
variable. Covalent coupling methods provide a stable linkage, can be applied to a 
range of proteins and have good reproducibility; however, orientation may be 
variable, chemical dramatization may alter the function of the protein and requires a 

30 stable interactive surface. Biological capture methods utilizing a tag on the protein 
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provide a stable linkage and bind the protein specifically and in reproducible 
orientation, but the biological reagent must first be immobilized adequately and the 
array may require special handling and have variable stability. 

Several immobilization chemistries and tags have been described for 
5 fabrication of protein arrays. Substrates for covalent attachment include glass slides 
coated with amino- or aldehyde-containing silane reagents [Telechem]. In the 
Versalinx™ system [Prolinx], reversible covalent coupling is achieved by 
interaction between the protein derivatized with phenyldiboronic acid, and 
salicylhydroxamic acid immobilized on the support surface. This also has low 
10 background binding and low intrinsic fluorescence and allows the immobilized 
proteins to retain function. Noncovalent binding of unmodified protein occurs within 
porous structures such as HydroGel™ [PerkinElmer], based on a 3-dimensional 
polyacrylamide gel; this substrate is reported to give a particularly low background 
on glass microarrays, with a high capacity and retention of protein function. Widely 
15 used biological capture methods are through biotin / streptavidin or hexahistidine / 
Ni interactions, having modified the protein appropriately. Biotin may be conjugated 
to a poly-lysine backbone immobilized on a surface such as titanium dioxide 
[Zyomyx] or tantalum pentoxide [Zeptosens]. 

Arenkov et a/., for example, have described a way to immobilize proteins 
20 while preserving their function by using microfabricated polyacrylamide gel pads to 
capture proteins, and then accelerating diffusion through the matrix by 
microelectrophoresis (Arenkov et al (2000), Anal Biochem 278(2): 123-31). The 
patent literature also describes a number of different methods for attaching 
biological molecules to solid supports. For example, U.S. Patent No. 4,282,287 
25 describes a method for modifying a polymer surface through the successive 
application of multiple layers of biotin, avidin, and extenders. U.S. Patent No. 
4,562,157 describes a technique for attaching biochemical ligands to surfaces by 
attachment to a photochemically reactive arylazide. U.S. Patent No. 4,681,870 
describes a method for introducing free amino or carboxyl groups onto a silica 
30 matrix, in which the groups may subsequently be covalently linked to a protein in 
the presence of a carbodiimide. In addition, U.S. Patent No. 4,762,881 describes a 
method for attaching a polypeptide chain to a solid substrate by incorporating a 
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light-sensitive unnatural amino acid group into the polypeptide chain and exposing 
the product to low-energy ultraviolet light. 

The surface of the support is chosen to possess, or is chemically derivatized 
to possess, at least one reactive chemical group that can be used for fiirther 
5 attachment chemistry. There may be optional flexible adapter molecules interposed 
between the support and the capture agents. In one embodiment, the capture agents 
are physically adsorbed onto the support. 

In certain embodiments of the invention, a capture agent is immobilized on a 
support in ways that separate the capture agent's PET binding site region and the 
10 region where it is linked to the support. In a preferred embodiment, the capture agent 
is engineered to form a covalent bond between one of its termini to an adapter 
molecule on the support. Such a covalent bond may be formed through a Schiff-base 
linkage, a linkage generated by a Michael addition, or a thioether linkage. 

In order to allow attachment by an adapter or directly by a capture agent, the 
15 surface of the substrate may require preparation to create suitable reactive groups. 
Such reactive groups could include simple chemical moieties such as amino, 
hydroxyl, carboxyl, carboxylate, aldehyde, ester, amide, amine, nitrile, sulfonyl, 
phosphoryl, or similarly chemically reactive groups. Alternatively, reactive groups 
may comprise more complex moieties that include, but are not limited to, sulfo-N- 
20 hydroxysuccinimide, nitrilotriacetic acid, activated hydroxyl, haloacetyl (e.g., 
bromoacetyl, iodoacetyl), activated carboxyl, hydrazide, epoxy, aziridine, 
sulfonylchloride, trifluoromethyldiaziridine, pyridyldisulfide, N-acyl-imidazole, 
imidazolecarbamate, succinimidylcarbonate, arylazide, anhydride, diazoacetate, 
benzophenone, isothiocyanate, isocyanate, imidoester, fluorobenzene, biotin and 
25 avidin. Techniques of placing such reactive groups on a substrate by mechanical, 
physical, electrical or chemical means are well known in the art, such as described 
by U.S. Pat. No. 4,681,870, incorporated herein by reference. 

Once the initial preparation of reactive groups on the substrate is completed 
(if necessary), adapter molecules optionally may be added to the surface of the 
30 substrate to make it suitable for further attachment chemistry. Such adapters 
covalently join the reactive groups already on the substrate and the capture agents to 
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be immobilized, having a backbone of chemical bonds forming a continuous 
connection between the reactive groups on the substrate and the capture agents, and 
having a plurality of freely rotating bonds along that backbone. Substrate adapters 
may be selected from any suitable class of compounds and may comprise polymers 
5 or copolymers of organic acids, aldehydes, alcohols, thiols, amines and the like. For 
example, polymers or copolymers of hydroxy-, amino-, or di-carboxylic acids, such 
as glycolic acid, lactic acid, sebacic acid, or sarcosine may be employed. 
Alternatively, polymers or copolymers of saturated or unsaturated hydrocarbons 
such as ethylene glycol, propylene glycol, saccharides, and the like may be 

10 employed. Preferably, the substrate adapter should be of an appropriate length to 
allow the capture agent, which is to be attached, to interact freely with molecules in 
a sample solution and to form effective binding. The substrate adapters may be 
either branched or unbranched, but this and other structural attributes of the adapter 
should not interfere stereochemically with relevant functions of the capture agents, 

1 5 such as a PET interaction. Protection groups, known to those skilled in the art, may 
be used to prevent the adapter's end groups from undesired or premature reactions. 
For instance, U.S. Pat. No. 5,412,087, incorporated herein by reference, describes 
the use of photo-removable protection groups on a adapter's thiol group. 

To preserve the binding affinity of a capture agent, it is preferred that the 
20 capture agent be modified so that it binds to the support substrate at a region 
separate from the region responsible for interacting with it's ligand, i.e., the PET. 

Methods of coupling the capture agent to the reactive end groups on the 
surface of the substrate or on the adapter include reactions that form linkage such as 
thioether bonds, disulfide bonds, amide bonds, carbamate bonds, urea linkages, ester 
25 bonds, carbonate bonds, ether bonds, hydrazone linkages, Schiff-base linkages, and 
noncovalent linkages mediated by, for example, ionic or hydrophobic interactions. 
The form of reaction will depend, of course, upon the available reactive groups on 
both the substrate/adapter and capture agent. 

30 C. Array fabrication consideration 


-67- 


WO 2005/078453 


PCT/US2005/003634 


Preferably, the immobilized capture agents are arranged in an array on a 
solid support, such as a silicon-based chip or glass slide. One or more capture agents 
designed to detect the presence (and optionally the concentration) of a given known 
protein (one previously recognized as existing) is immobilized at each of a plurality 
5 of cells / regions in the array. Thus, a signal at a particular cell / region indicates the 
presence of a known protein in the sample, and the identity of the protein is revealed 
by the position of the cell. Alternatively, capture agents for one or a plurality of PET 
are immobilized on beads, which optionally are labeled to identify their intended 
target analyte, or are distributed in an array such as a microwell plate. 

10 In one embodiment, the microarray is high density, with a density over about 

100, preferably over about 1000, 1500, 2000, 3000, 4000, 5000 and further 
preferably over about 9000, 10000, 1 1000, 12000 or 13000 spots per cm 2 , formed by 
attaching capture agents onto a support surface which has been functional ized to 
create a high density of reactive groups or which has been fiinctionalized by the 

15 addition of a high density of adapters bearing reactive groups. In another 
embodiment, the microauay comprises a relatively small number of capture agents, 
e.g., 10 to 50, selected to detect in a sample various combinations of specific 
proteins which generate patterns probative of disease diagnosis, cell type 
determination, pathogen identification, etc. 

20 Although the characteristics of the substrate or support may vary depending 

upon the intended use, the shape, material and surface modification of the substrates 
must be considered. Although it is preferred that the substrate have at least one 
surface which is substantially planar or flat, it may also include indentations, 
protuberances, steps, ridges, terraces and the like and may have any geometric form 

25 (e.g., cylindrical, conical, spherical, concave surface, convex surface, string, or a 
combination of any of these). Suitable substrate materials include, but are not 
limited to, glasses, ceramics, plastics, metals, alloys, carbon, papers, agarose, silica, 
quartz, cellulose, polyacrylamide, polyamide, and gelatin, as well as other polymer 
supports, other solid-material supports, or flexible membrane supports. Polymers 

30 that may be used as substrates include, but are not limited to: polystyrene; 
poly(tetra)fluoroethylene (PTFE); polyvinylidenedifluoride; polycarbonate; 
polymethylmethacrylate; polyvinylethylene; polyethyleneimine; polyoxymethylene 
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(POM); polyvinylphenol; polylactides; polymethaciylimide (PMI); 

polyalkenesulfone (PAS); polypropylene; polyethylene; 

polyhydroxyethylmethacrylate (HEMA); polydimethylsiloxane; polyacrylamide; 

polyimide; and various block co-polymers. The substrate can also comprise a 
5 combination of materials, whether water-permeable or not, in multi-layer 

configurations. A preferred embodiment of the substrate is a plain 2.5 cm x 7.5 cm 

glass slide with surface Si-OH functionalities. 

Array fabrication methods include robotic contact printing, ink-jetting, 

piezoelectric spotting and photolithography. A number of commercial arrayers are 
10 available [e.g. Packard Biosience] as well as manual equipment [V & P Scientific]. 

Bacterial colonies can be robotically gridded onto PVDF membranes for induction 

of protein expression in situ. 

At the limit of spot size and density are nanoarrays, with spots on the 

nanometer spatial scale, enabling thousands of reactions to be performed on a single 
15 chip less than 1mm square. BioForce Laboratories have developed nanoarrays with 

1521 protein spots in 85sq microns, equivalent to 25 million spots per sq cm, at the 

limit for optical detection; their readout methods are fluorescence and atomic force 

microscopy (AFM). 

A microfluidics system for automated sample incubation with arrays on glass 
20 slides and washing has been codeveloped by NextGen and PerkinElmer 

Lifesciences. 

For example, capture agent microarrays may be produced by a number of 
means, including "spotting" wherein small amounts of the reactants are dispensed to 
particular positions on the surface of the substrate. Methods for spotting include, but 

25 are not limited to, microfluidics printing, microstamping (see, e.g., U.S. Pat. No. 
5,515,131, U.S. Pat. No. 5,731,152, Martin, B.D. et al (1998), Langmuir 14: 
3971-3975 and Haab, BB et al (2001) Genome Biol 2 and MacBeath, G. et al 
(2000) Science 289: 1760-1763), microcontact printing (see, e.g., PCT Publication 
WO 96/29629), inkjet head printing (Roda, A. et al (2000) BioTechniques 28: 

30 492*496, and Silzel, J.W. et al (1998) Clin Chem 44: 2036-2043), microfluidic 
direct application (Rowe, C. A. et al (1 999) Anal Chem 7 1 : 433-439 and Bernard, A. 
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et al (2001), Anal Chem 73: 8-12) and electrospray deposition (Morozov, V.N. et al 
(1999) Anal Chem 71: 1415-1420 and Moerman R. et al. (2001) C/ze/w 73: 
2183-2189). Generally, the dispensing device includes calibrating means for 
controlling the amount of sample deposition, and may also include a structure for 

5 moving and positioning the sample in relation to the support surface. The volume of 
fluid to be dispensed per capture agent in an array varies with the intended use of the 
array, and available equipment. Preferably, a volume formed by one dispensation is 
less than 100 nL, more preferably less than 10 nL, and most preferably about InL. 
The size of the resultant spots will vary as well, and in preferred embodiments these 

10 spots are less than 20,000 urn in diameter, more preferably less than 2,000 urn in 
diameter, and most preferably about 150-200 fim in diameter (to yield about 1600 
spots per square centimeter). Solutions of blocking agents may be applied to the 
microarrays to prevent non-specific binding by reactive groups that have not bound 
to a capture agent. Solutions of bovine serum albumin (BSA), casein, or nonfat milk, 

15 for example, may be used as blocking agents to reduce background binding in 
subsequent assays. 

In preferred embodiments, high-precision, contact-printing robots are used to 
pick .up small volumes of dissolved capture, agents from the wells of a microtiter 
plate and to repetitively deliver approximately 1 nL of the solutions to defined 

20 locations on the surfaces of substrates, such as chemically-derivatized glass 
microscope slides. Examples of such robots include the GMS 417 Arrayer, 
commercially available from Affymetrix of Santa Clara, CA, and a split pin arrayer 
constructed according to instructions downloadable from the Brown lab website at 
http://cmgm.stanford.edu/pbrown. This results in the formation of microscopic spots 

25 of compounds on the slides. It will be appreciated by one of ordinary skill in the art, 
however, that the current invention is not limited to the delivery of 1 nL volumes of 
solution, to the use of particular robotic devices, or to the use of chemically 
derivatized glass slides, and that alternative means of delivery can be used that are 
capable of delivering picoliter or smaller volumes. Hence, in addition to a high 

30 precision array robot, other means for delivering the compounds can be used, 
including, but not limited to, ink jet printers, piezoelectric printers, and small 
volume pipetting robots. 
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In one embodiment, the compositions, eg, microarrays or beads, comprising 
the capture agents of the present invention may also comprise other components, 
e.g., molecules that recognize and bind specific peptides, metabolites, drugs or drug 
candidates, RNA, DNA, lipids, and the like. Thus, an array of capture agents only 

5 some of which bind a PET can comprise an embodiment of the invention. 

As an alternative to planar microarrays, bead-based assays combined with 
fluorescence-activated cell sorting (FACS) have been developed to perform 
multiplexed immunoassays. Fluorescence-activated cell sorting has been routinely 
used in diagnostics for more than 20 years. Using mAbs, cell surface markers are 

10 identified on normal and neoplastic cell populations enabling the classification of 
various forms of leukemia or disease monitoring (recently reviewed by Herzenberg 
et al. Immunol Today 21 (2000), pp. 383-390). 

Bead-based assay systems employ microspheres as solid support for the 
capture molecules instead of a planar substrate, which is conventionally used for 

15 microarray assays. In each individual immunoassay, the capture agent is coupled to 
a distinct type of microsphere. The reaction takes place on the surface of the 
microspheres. The individual microspheres are color-coded by a uniform and 
distinct mixture of red and orange fluorescent dyes. After coupling to the appropriate 
capture molecule, the different color-coded bead sets can be pooled and the 

20 immunoassay is performed in a single reaction vial. Product formation of the PET 
targets with their respective capture agents on the different bead types can be 
detected with a fluorescence-based reporter system. The signal intensities are 
measured in a flow cytometer, which is able to quantify the amount of captured 
targets on each individual bead. Each bead type and thus each immobilized target is 

25 identified using the color code measured by a second fluorescence signal. This 
allows the multiplexed quantification of multiple targets from a single sample. 
Sensitivity, reliability and accuracy are similar to those observed with standard 
microtiter ELISA procedures. Color-coded microspheres can be used to perform up 
to a hundred different assay types simultaneously (LabMAP system, Laboratory 

30 Muliple Analyte Profiling, Luminex, Austin, TX, USA). For example, microsphere- 
based systems have been used to simultaneously quantify cytokines or 
autoantibodies from biological samples (Carson and Vignali, J Immunol Methods 
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227 (1999), pp. 41-52; Chenet al., Clin Chem 45 (1999), pp. 1693-1694; Fulton et 
al., Clin Chem 43 (1997), pp. 1749-1756). Bellisario et al. {Early Hum Dev 64 
(2001), pp. 21-25) have used this technology to simultaneously measure antibodies 
to three HIV-1 antigens from newborn dried blood-spot specimens. 

5 Bead-based systems have several advantages. As the capture molecules are 

coupled to distinct microspheres, each individual coupling event can be perfectly 
analyzed. Thus, only quality-controlled beads can be pooled for multiplexed 
immunoassays. Furthermore, if an additional parameter has to be included into the 
assay, one must only add a new type of loaded bead. No washing steps are required 

10 when performing the assay. The sample is incubated with the different bead types 
together with fluorescently labeled detection antibodies. After formation of the 
sandwich immuno-complex, only the fluorophores that are definitely bound to the 
surface of the microspheres are counted in the flow cytometer. 


1 5 D. Related non-array formats 

An alternative to an array of capture agents is one made through the so-called 
"molecular imprinting" technology, in which peptides (e.g. selected PETs) are used 
as templates to generate structurally complementary, sequence-specific cavities in a 
polymerisable matrix; the cavities can then specifically capture (digested) proteins 

20 which have the appropriate primary amino acid sequence [ProteinPrint™, Aspira 
Biosystems]. To illustrate, a chosen PET can be synthesized, and a universal matrix 
of polymerizable monomers is allowed to self assemble around the peptide and 
crosslinked into place. The PET, or template, is then removed, leaving behind a 
cavity complementary in shape and functionality. The cavities can be formed on a 

25 film, discrete sites of an array or the surface of beads. When a sample of fragmented 
proteins is exposed to the capture agent, the polymer will selectively retain the target 
protein containing the PET and exclude all others. After the washing, only the bound 
PET-containing peptides remain. Common staining and tagging procedures, or any 
of the non-labeling techniques described below can be used to detect expression 

30 levels and/or post translation^ modifications. See, for example, WO 01/61354 Al 
and WO 01/61355 Al. 
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Alternatively, the captured peptides can be eluted for further analysis such as 
mass spectrometry analysis. Although several well-established chemical methods for 
the sequencing of peptides, polypeptides and proteins are known (for example, the 
Edman degradation), mass spectrometric methods are becoming increasingly 

5 important in view of their speed and ease of use. Mass spectrometric methods have 
been developed to the point at which they are capable of sequencing peptides in a 
mixture even without any prior chemical purification or separation, typically using 
electrospray ionization and tandem mass spectrometry (MS/MS). For example, see 
Yates ID (J. Mass Spectrom, 1998 vol. 33 pp. 1-19), Papayannopoulos (Mass 

10 Spectrom. Rev. 1995, vol. 14 pp. 49-73), and Yates in, McCormack, and Eng (Anal. 
Chem. 1996 vol. 68 (17) pp. 534A-540A). thus, in a typical MS/MS sequencing 
experiment, molecular ions of a particular peptide are selected by the first mass 
analyzer and fragmented by collisions with neutral gas molecules in a collision cell. 
The second mass analyzer is then used to record the fragment ion spectrum that 

15 generally contains enough information to allow at least a partial, and often the 
complete, sequence to be determined. See, for example, U.S. Pat. No. 6,489,608, 
5,470,753, 5,246,865, all incorporated hereion by reference, and related applications 
/ patents. 

Another methodology which can be used diagnostically and in expression 
20 profding is the ProteinChip® array [Ciphergen], in which solid phase 
chromatographic surfaces bind proteins with similar characteristics of charge or 
hydrophobicity from mixtures such as plasma or tumor extracts, and SELDI-TOF 
mass spectrometry is used to detection the retained proteins. The ProteinChip® is 
credited with the ability to identify novel disease markers. However, this technology 
25 differs from the protein arrays under discussion here since, in general, it does not 
involve immobilization of individual proteins for detection of specific ligand 
interactions. 

E. Single Assay Format 
30 PET-specific affinity capture agents can also be used in a single assay 

format. For example, such agents can be used to develop a better assay for detecting 
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circulating agents, such as PSA, by providing increased sensitivity, dynamic range 
and/or recovery rate. For instance, the single assays can have functional performance 
characteristics which exceed traditional ELISA and other immunoassays, such as 
one or more of the following: a regression coefficient (R2) of 0.95 or greater for a 
5 reference standard, e.g., a comparable control sample, more preferably an R2 greater 
than 0.97, 0.99 or even 0.995; a recovery rate of at least 50 percent, and more 
preferably at least 60, 75, 80 or even 90 percent; a positive predictive value for 
occurrence of the protein in a sample of at least 90 percent, more preferably at least 
95, 98 or even 99 percent; a diagnostic sensitivity (DSN) for occurrence of the 
10 protein in a sample of 99 percent or higher, more preferably at least 99.5 or even 
99.8 percent; a diagnostic specificity (DSP) for occurrence of the protein in a sample 
of 99 percent or higher, more preferably at least 99.5 or even 99.8 percent. 


III. Methods of Detecting Binding Events 

15 The capture agents of the invention, as well as compositions, e.g., 

microarrays or beads, comprising these capture agents have a wide range of 
applications in the health care industry, e.g., in therapy, in clinical diagnostics, in in 
vivo imaging or in drug discovery. The capture agents of the present invention also 
have industrial and environmental applications, e.g., in environmental diagnostics, 

20 industrial diagnostics, food safety, toxicology, catalysis of reactions, or high- 
throughput screening; as well as applications in the agricultural industry and in basic 
research, e.g., protein sequencing. 

The capture agents of the present invention are a powerful analytical tool that 
enables a user to detect a specific protein, or group of proteins of interest present 

25 within complex samples. In addition, the invention allow for efficient and rapid 
analysis of samples; sample conservation and direct sample comparison. The 
invention enables "multi-parametric" analysis of protein samples. As used herein, a 
"multi-parametric" analysis of a protein sample is intended to include an analysis of 
a protein sample based on a plurality of parameters. For example, a protein sample 

30 may be contacted with a plurality of PETs, each of the PETs being able to detect a 
different protein within the sample. Based on the combination and, preferably the 


-74- 


WO 2005/078453 


PCT/US2005/003634 


relative concentration, of the proteins detected in the sample the skilled artisan 
would be able to determine the identity of a sample, diagnose a disease or pre- 
disposition to a disease, or determine the stage of a disease 

The capture agents of the present invention may be used in any method 

5 suitable for detection of a protein or a polypeptide, such as, for example, in 
immunoprecipitations, immunocytochemistry, Western Blots or nuclear magnetic 
resonance spectroscopy (NMR). 

To detect the presence of a protein that interacts with a capture agent, a 
variety of art known methods may be used. The protein to be detected may be 

10 labeled with a detectable label, and the amount of bound label directly measured. 
The term "label" is used herein in a broad sense to refer to agents that are capable of 
providing a detectable signal, either directly or through interaction with one or more 
additional members of a signal producing system. Labels that are directly detectable 
and may find use in the present invention include, for example, fluorescent labels 

15 such as fluorescein, rhodamine, BODIPY, cyanine dyes (e.g. from Amersham 
Pharmacia), Alexa dyes (e.g. from Molecular Probes, Inc.), fluorescent dye 
phosphoramidites, beads, chemilumninescent compounds, colloidal particles, and 
the like. Suitable fluorescent dyes are known in the art, including 
fluoresceinisothiocyanate (FITC); rhodamine and rhodamine derivatives; Texas Red; 

20 phycoerythrin; allophycocyanin; 6-carboxyfluorescein (6-FAM); ^T-dimethoxy- 
41,51-dichloro carboxyfluorescein (JOE); 6-carboxy-X-rhodamine (ROX); 6- 
carboxy-2 1 ,4 1 ,7 1 ,4,7-hexachlorofluorescein (HEX); 5-carboxy fluorescein (5-F AM) ; 
N,N,Nl,N*-tetramethyI carboxyrhodamine (TAMRA); sulfonated rhodamine; Cy3; 
Cy5, etc. Radioactive isotopes, such as 35 S, 32 P, 3 H, ,25 I, etc., and the like can also be 

25 used for labeling. In addition, labels may also include near-infrared dyes (Wang et 
al, Anal Chem. y 72:5907-5917 (2000), upconverting phosphors (Hampl et al, Anal 
Biochem., 288:176-187 (2001), DNA dendrimers (Stears etal, Physiol Genomics 3: 
93-99 (2000), quantum dots (Bruchez et al, Science 281:2013-2016 (1998), latex 
beads (Okana et al t Anal Biochem. 202:120-125 (1992), selenium particles 

30 (Stimpson et al. Proc. Natl Acad Scl 92:6379-6383 (1995), and europium 
nanoparticles (Harma et al, Clin. Chem. 47:561-568 (2001). The label is one that 
preferably does not provide a variable signal, but instead provides a constant and 
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reproducible signal over a given period of time. 

A very useful labeling agent is water-soluble quantum dots, or so-called 
"functionalized nanocrystals" or "semiconductor nanocrystals" as described in U.S. 
Pat. No. 6,1 14,038. Generally, quantum dots can be prepared which result in relative 

5 monodispersity (e.g., the diameter of the core varying approximately less than 10% 
between quantum dots in the preparation), as has been described previously 
(Bawendi et al., 1993, J. Am. Chem. Soc. 1 15:8706). Examples of quantum dots are 
known in the art to have a core selected from the group consisting of CdSe, CdS, 
and CdTe (collectively referred to as "CdX")(see, e.g., Norris et al., 1996, Physical 

10 Review B. 53:16338-16346; Nirmal et al., 1996, Nature 383:802-804; Empedocles 
et al., 1996, Physical Review Letters 77:3873-3876; Murray et al., 1996, Science 
270: 1355-1338; Effros et al., 1996, Physical Review B. 54:4843-4856; Sacra et al., 
1996, J. Chem. Phys. 103:5236-5245; Murakoshi et al., 1998, J. Colloid Interface 
Sci. 203:225-228; Optical Materials and Engineering News, 1995, Vol. 5, No. 12; 

15 and Murray et al., 1993, J. Am. Chem. Soc. 1 15:8706-8714; the disclosures of which 
are hereby incorporated by reference). 

CdX quantum dots have been passivated with an inorganic coating ("shell") 
uniformly deposited thereon. Passivating the surface of the core quantum dot can 
result in an increase in the quantum yield of the luminescence emission, depending 

20 on the nature of the inorganic coating. The shell which is used to passivate the 
quantum dot is preferably comprised of YZ wherein Y is Cd or Zn, and Z is S, or Se. 
Quantum dots having a CdX core and a YZ shell have been described in the art (see, 
e.g., Danek et al., 1996, Chem. Mater. 8:173-179; Dabbousi et al., 1997, J. Phys. 
Chem. B 101:9463; Rodriguez-Viejo et al., 1997, Appl. Phys. Lett. 70:2132-2134; 

25 Peng et al., 1997, J. Am. Chem. Soc. 119:7019-7029; 1996, Phys. Review B. 
53:16338-16346; the disclosures of which are hereby incorporated by reference). 
However, the above described quantum dots, passivated using an inorganic shell, 
have only been soluble in organic, non-polar (or weakly polar) solvents. To make 
quantum dots useful in biological applications, it is desirable that the quantum dots 

30 are water-soluble. "Water-soluble" is used herein to mean sufficiently soluble or 
suspendable in an aqueous-based solution, such as in water or water-based solutions 
or buffer solutions, including those used in biological or molecular detection 
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systems as known by those skilled in the art. 

U.S. Pat. No. 6,114,038 provides a composition comprising functionalized 
nanocrystals for use in non-isotopic detection systems. The composition comprises 
quantum dots (capped with a layer of a capping compound) that are water-soluble 
5 and functionalized by operably linking, in a successive manner, one or more 
additional compounds. In a preferred embodiment, the one or more additional 
compounds form successive foyers over the nanocrystal. More particularly, the 
functionalized nanocrystals comprise quantum dots capped with the capping 
compound, and have at least a diaminocarboxylic acid which is operatively linked to 

10 the capping compound. Thus, the functionalized nanocrystals may have a first layer 
comprising the capping compound, and a second layer comprising a 
diaminocarboxylic acid; and may further comprise one or more successive layers 
including a layer of amino acid, a layer of affinity ligand, or multiple layers 
comprising a combination thereof. The composition comprises a class of quantum 

15 dots that can be excited with a single wavelength of light resulting in detectable 
luminescence emissions of high quantum yield and with discrete luminescence 
peaks. Such functionalized nanocrystal may be used to label capture agents of the 
instant invention for their use in the detection and/or quantitation of the binding 
events. 

20 U.S. Pat. No. 6,326,144 describes quantum dots (QDs) having a 

characteristic spectral emission, which is tunable to a desired energy by selection of 
the particle size of the quantum dot. For example, a 2 nanometer quantum dot emits 
green light, while a 5 nanometer quantum dot emits red light. The emission spectra 
of quantum dots have linewidths as narrow as 25-30 nm depending on the size 

25 heterogeneity of the sample, and lineshapes that are symmetric, gaussian or nearly 
gaussian with an absence of a tailing region. The combination of tunability, narrow 
linewidths, and symmetric emission spectra without a tailing region provides for 
high resolution of multiply-sized quantum dots within a system and enables 
researchers to examine simultaneously a variety of biological moieties tagged with 

30 QDs. In addition, the range of excitation wavelengths of the nanocrystal quantum 
dots is broad and can be higher in energy than the emission wavelengths of all 
available quantum dots. Consequently, this allows the simultaneous excitation of all 
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quantum dots in a system with a single light source, usually in the ultraviolet or blue 
region of the spectrum. QDs are also more robust than conventional organic 
fluorescent dyes and are more resistant to photobleaching than the organic dyes. The 
robustness of the QD also alleviates the problem of contamination of the degradation 

5 products of the organic dyes in the system being examined. These QDs can be used 
for labeling capture agents of protein, nucleic acid, and other biological molecules in 
nature. Cadmium Selenide quantum dot nanocrystals are available from Quantum 
Dot Corporation of Hayward, Califormia. 

Alternatively, the sample to be tested is not labeled, but a second stage 

10 labeled reagent is added in order to detect the presence or quantitate the amount of 
protein in the sample. Such "sandwich based" methods of detection have the 
disadvantage that two capture agents must be developed for each protein, one to 
capture the PET and one to label it once captured. Such methods have the advantage 
that they are characterized by an inherently improved signal to noise ratio as they 

15 exploit two binding reactions at different points on a peptide, thus the presence 
and/or concentration of the protein can be measured with more accuracy and 
precision because of the increased signal to noise ratio. 

In yet another embodiment, the subject capture array can be a "virtual 
arrays". For example, a virtual array can be generated in which antibodies or other 

20 capture agents are immobilized on beads whose identity, with respect to the 
particular PET it is specific for as a consequence to the associated capture agent, is 
encoded by a particular ratio of two or more covalently attached dyes. Mixtures of 
encoded PET-beads are added to a sample, resulting in capture of the PET entities 
recognized by the immobilized capture agents. 

25 To quantitate the captured species, a sandwich assay with fluorescently 

labeled antibodies that bind the captured PET, or a competitive binding assay with a 
fluorescently labeled ligand for the capture agent, are added to the mix. In one 
embodiment, the labeled ligand is a labeled PET that competes with the analyte PET 
for binding to the capture agent. The beads are then introduced into an instrument, 

30 such as a flow cytometer, that reads the intensity of the various fluorescence signals 
on each bead, and the identity of the bead can be determined by measuring the ratio 
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of the dyes (Figure 3). This technology is relatively fast and efficient, and can be 
adapted by researchers to monitor almost any set of PET of interest. 

In another embodiment, an array of capture agents are embedded in a matrix 
suitable for ionization (such as described in Fung et al. (2001) Curr. Opin. 
5 Biotechnol. 12:65-69). After application of the sample and removal of unbound 
molecules (by washing), the retained PET proteins are analyzed by mass 
spectrometry. In some instances, further proteolytic digestion of the bound species 
with trypsin may be required before ionization, particularly if electrospray is the 
means for ionizing the peptides. 
10 All the above named reagents may be used to label the capture agents. 

Preferably, the capture agent to be labeled is combined with an activated dye that 
reacts with a group present on the protein to be detected, e.g., amine groups, thiol 
groups, or aldehyde groups. 

The label may also be a covalently bound enzyme capable of providing a 
15 detectable product signal after addition of suitable substrate. Examples of suitable 
enzymes for use in the present invention include horseradish peroxidase, alkaline 
phosphatase, malate dehydrogenase and the like. 

Enzyme-Linked Immunosorbent Assay (ELISA) may also be used for 
detection of a protein that interacts with a capture agent. In an ELISA, the indicator 
20 molecule is covalently coupled to an enzyme and may, be quantified by determining 
with a spectrophotometer the initial rate at which the enzyme converts a clear 
substrate to a correlated product. Methods for performing ELISA are well known in 
the art and described in, for example, Perlmann, H. and Perlmann, P. (1994). 
Enzyme-Linked Immunosorbent Assay. In: Cell Biology: A Laboratory Handbook. 
25 San Diego, CA, Academic Press, Inc., 322-328; Crowther, J.R. (1995). Methods in 
Molecular Biology, Vol. 42-ELISA: Theory and Practice. Humana Press, Totowa, 
NJ.; and Harlow, E. and Lane, D. (1988). Antibodies: A Laboratory Manual. Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 553-612, the contents of 
each of which are incorporated by reference. Sandwich (capture) ELISA may also be 
30 used to detect a protein that interacts with two capture agents. The two capture 
agents may be able to specifically interact with two PETs that are present on the 
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same peptide (e.g., the peptide which has been generated by fragmentation of the 
sample of interest, as described above). Alternatively, the two capture agents may be 
able to specifically interact with one PET and one non-unique amino acid sequence, 
both present on the same peptide (e.g., the peptide which has been generated by 
5 fragmentation of the sample of interest, as described above). Sandwich ELISAs for 
the quantitation of proteins of interest are especially valuable when the concentration 
of the protein in the sample -is low and/or the protein of interest is present in a 
sample that contains high concentrations of contaminating proteins. 

A fully-automated, microarray-based approach for high-throughput, ELISAs 
10 was described by Mendoza et al. (BioTechniques 27:778-780,782-786,788, 1999). 
This system consisted of an optically flat glass plate with 96 wells separated by a 
Teflon mask. More than a hundred capture molecules were immobilized in each 
well. Sample incubation, washing and fluorescence-based detection were performed 
with an automated liquid pipettor. The microarrays were quantitatively imaged with 
1 5 a scanning charge-coupled device (CCD) detector. Thus, the feasibility of multiplex 
detection of arrayed antigens in a high-throughput fashion using marker antigens 
could be successfully demonstrated. In addition, Silzel et al. {Clin Chem 44 pp. 
2036-2043, 1998) could demonstrate that multiple IgG subclasses can be detected 
simultaneously using microarray technology. Wiese et al. (Clin Chem 47 pp. 1451- 
20 1457, 2001) were able to measure prostate-specific antigen (PSA), -(1)- 
antichymotrypsin-bound PSA and interleukin-6 in a microarray format. Arenkov et 
al. (supra) carried out microarray sandwich immunoassays and direct antigen or 
antibody detection experiments using a modified polyacrylamide gel as substrate for 
immobilized capture molecules. 

25 Most of the microarray assay formats described in the art rely on 

chemiluminescence- or fluorescence-based detection methods. A further 
improvement with regard to sensitivity involves the application of fluorescent labels 
and waveguide technology. A fluorescence-based array immunosensor was 
developed by Rowe et al. (Anal Chem 71 (1999), pp. 433-439; and Biosens 

30 Bioelectron 15 (2000), pp. 579-589) and applied for the simultaneous detection of 
clinical analytes using the sandwich immunoassay format. Biotinylated capture 
antibodies were immobilized on avidin-coated waveguides using a flow-chamber 
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module system. Discrete regions of capture molecules were vertically arranged on 
the surface of the waveguide. Samples of interest were incubated to allow the targets 
to bind to their capture molecules. Captured targets were then visualized with 
appropriate fluorescently labeled detection molecules. This array immunosensor was 
5 shown to be appropriate for the detection and measurement of targets at 
physiologically relevant concentrations in a variety of clinical samples. . 

A further increase in the sensitivity using waveguide technology was 
achieved with the development of the planar waveguide technology (Duveneck et 
aL, Sens Actuators B B38 (1997), pp. 88-95). Thin-film waveguides are generated 

10 from a high-refractive material such as Ta 2 0 5 that is deposited on a transparent 
substrate. Laser light of desired wavelength is coupled to the planar waveguide by 
means of diffractive grating. The light propagates in the planar waveguide and an 
area of more than a square centimeter can be homogeneously illuminated. At the 
surface, the propagating light generates a so-called evanescent field. This extends 

15 into the solution and activates only fluorophores that are bound to the surface. 
Fluorophores in the surrounding solution are not excited. Close to the surface, the 
excitation field intensities can be a hundred times higher than those achieved with 
standard confocal excitation. A CCD camera is used to identify signals 
simultaneously across the entire area of the planar waveguide. Thus, the 

20 immobilization of the capture molecules in a microarray format on the planar 
waveguide allows the performance of highly sensitive miniaturized and parallelized 
immunoassays. This system was successfully employed to detect interleukin-6 at 
concentrations as low as 40 fM and has the additional advantage that the assay can 
be performed without washing steps that are usually required to remove unbound 

25 detection molecules (Weinberger et al., Pharmacogenomics 1 (2000), pp. 395-416). 

Alternative strategies pursued to increase sensitivity are based on signal 
amplification procedures. For example, immunoRCA (immuno rolling circle 
amplification) involves an oligonucleotide primer that is covalently attached to a 
detection molecule (such as a second capture agent in a sandwich-type assay 
30 format). Using circular DNA as template, which is complementary to the attached 
oligonucleotide, DNA polymerase will extend the attached oligonucleotide and 
generate a long DNA molecule consisting of hundreds of copies of the circular 
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DNA, which remains attached to the detection molecule. The incorporation of 
thousands of fluorescently labeled nucleotides will generate a strong signal. 
Schweitzer et al. (Proc Natl Acad Sci USA 97 (2000), pp. 10113-10119) have 
evaluated this detection technology for use in microarray-based assays. Sandwich 
5 immunoassays for hulgE and prostate-specific antigens were performed in a 
microarray format. The antigens could be detected at femtomolar concentrations and 
it was possible to score single, specifically captured antigens by counting discrete 
fluorescent signals that arose from the individual antibody-antigen complexes. The 
authors demonstrated that immunoassays employing rolling circle DNA 
1 0 amplification are a versatile platform for the ultra-sensitive detection of antigens and 
thus are well suited for use in protein microarray technology. 

Radioimmunoassays (RIA) may also be used for detection of a protein that 
interacts with a capture agent. In a RIA, the indicator molecule is labeled with a 
radioisotope and it may be quantified by counting radioactive decay events in a 
15 scintillation counter. Methods for performing direct or competitive RIA are well 
known in the art and described in, for example, Cell Biology: A Laboratory 
Handbook. San Diego, CA, Academic Press, Inc., the contents of which are 
incorporated herein by reference. 

Other immunoassays commonly used to quantitate the levels of proteins in 
20 cell samples, and are well-known in the art, can be adapted for use in the instant 
invention. The invention is not limited to a particular assay procedure, and therefore 
is intended to include both homogeneous and heterogeneous procedures. Exemplary 
other immunoassays which can be conducted according to the invention include 
fluorescence polarization immunoassay (FPIA), fluorescence immunoassay (FIA), 
25 enzyme immunoassay (EIA), nephelometric inhibition immunoassay (NIA). An 
indicator moiety, or label group, can be attached to the subject antibodies and is 
selected so as to meet the needs of various uses of the method which are often 
dictated by the availability of assay equipment and compatible immunoassay 
procedures. General techniques to be used in performing the various immunoassays 
30 noted above are known to those of ordinary skill in the art. In one embodiment, the 
determination of protein level in a biological sample may be performed by a 
microarray analysis (protein chip). 
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In several other embodiments, detection of the presence of a protein that 
interacts with a capture agent may be achieved without labeling. For example, 
determining the ability of a protein to bind to a capture agent can be accomplished 
using a technology such as real-time Biomolecular Interaction Analysis (BIA). 

5 Sjolander, S. and Urbaniczky, C (1991) Anal. Chem. 63:2338-2345 and Szabo et al. 
(1995) Curr Opin. Struct. Biol 5:699-705. As used herein, "BIA" is a technology 
for studying biospecific interactions in real time, without labeling any of the 
interactants (e.g., BIAcore). 

In another embodiment, a biosensor with a special diffiractive grating surface 

10 may be used to detect / quantitate binding between non-labeled PET-containing 
peptides in a treated (digested) biological sample and immobilized capture agents at 
the surface of the biosensor. Details of the technology is described in more detail in 
B. Cunningham, P. Li, B. Lin, J. Pepper, "Colorimetric resonant reflection as a 
direct biochemical assay technique," Sensors and Actuators B, Volume 81, p. 316- 

1 5 328, Jan 5 2002, and in PCT No. WO 02/061429 A2 and US 2003/0032039. Briefly, 
a guided mode resonant phenomenon is used to produce an optical structure that, 
when illuminated with collimated white light, is designed to reflect only a single 
wavelength (color). When molecules are attached to the surface of the biosensor, the 
reflected wavelength (color) is shifted due to the change of the optical path of light 

20 that is coupled into the grating. By linking receptor molecules to the grating surface, 
complementary binding molecules can be detected / quantitated without the use of 
any kind of fluorescent probe or particle label. The spectral shifts may be analyzed 
to determine the expression data provided, and to indicate the presence or absence of 
a particular indication. 

25 The biosensor typically comprises: a two-dimensional grating comprised of a 

material having a high refractive index, a substrate layer that supports the two- 
dimensional grating, and one or more detection probes immobilized on the surface 
of the two-dimensional grating opposite of the substrate layer. When the biosensor is 
illuminated a resonant grating effect is produced on the reflected radiation spectrum. 

30 The depth and period of the two-dimensional grating are less than the wavelength of 
the resonant grating effect. 
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A narrow band of optical wavelengths can be reflected from the biosensor 
when it is illuminated with a broad band of optical wavelengths. The substrate can 
comprise glass, plastic or epoxy. The two-dimensional grating can comprise a 
material selected from the group consisting of zinc sulfide, titanium dioxide, 
5 tantalum oxide, and silicon nitride. 

The substrate and two-dimensional grating can optionally comprise a single 
unit. The surface of the single unit comprising the two-dimensional grating is coated 
with a material having a high refractive index, and the one or more detection probes 
are immobilized on the surface of the material having a high refractive index 
10 opposite of the single unit. The single unit can be comprised of a material selected 
from the group consisting of glass, plastic, and epoxy. 

The biosensor can optionally comprise a cover layer on the surface of the 
two-dimensional grating opposite of the substrate layer. The one or more detection 
probes are immobilized on the surface of the cover layer opposite of the two- 
15 dimensional grating. The cover layer can comprise a material that has a lower 
refractive index than the high refractive index material of the two-dimensional 
grating. For example, a cover layer can comprise glass, epoxy, and plastic. 

A two-dimensional grating can be comprised of a repeating pattern of shapes 
selected from the group consisting of lines, squares, circles, ellipses, triangles, 
20 trapezoids, sinusoidal waves, ovals, rectangles, and hexagons. The repeating pattern 
of shapes can be arranged in a linear grid, i.e., a grid of parallel lines, a rectangular 
grid, or a hexagonal grid. The two-dimensional grating can have a period of about 
0.01 microns to about I micron and a depth of about 0.01 microns to about 1 micron. 

To illustrate, biochemical interactions occurring on a surface of a 
25 calorimetric resonant optical biosensor embedded into a surface of a microarray 
slide, microtiter plate or other device, can be directly detected and measured on the 
sensor's surface without the use of fluorescent tags or calorimetric labels. The sensor 
surface contains an optical structure that, when illuminated with collimated white 
light, is designed to reflect only a narrow band of wavelengths (color). The narrow 
30 wavelength is described as a wavelength "peak" The "peak wavelength value" 
(PWV) changes when biological material is deposited or removed from the sensor 
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surface, such as when binding occurs. Such binding-induced change of PWV can be 
measured using a measurement instrument disclosed in US2003/0032039. 

In one embodiment, the instrument illuminates the biosensor surface by 
directing a collimated white light on to the sensor structure. The illuminated light 

5 may take the form of a spot of collimated light. Alternatively, the light is generated 
in the form of a fan beam. The instrument collects light reflected from the 
illuminated biosensor surface. The instrument may gather this reflected light from 
multiple locations on the biosensor surface simultaneously. The instrument can 
include a plurality of illumination probes that direct the light to a discrete number of 

10 positions across the biosensor surface. The instrument measures the Peak 
Wavelength Values (PWVs) of separate locations within the biosensor-embedded 
microtiter plate using a spectrometer. In one embodiment, the spectrometer is a 
single-point spectrometer. Alternatively, an imaging spectrometer is used. The 
spectrometer can produce a PWV image map of the sensor surface. In one 

15 embodiment, the measuring instrument spatially resolves PWV images with less 
than 200 micron resolution. 

In one embodiment, a subwavelength structured surface (SWS) may be used 
to create a sharp optical resonant reflection at a particular wavelength that can be 
used to track with high sensitivity the interaction of biological materials, such as 

20 specific binding substances or binding partners or both. A colormetric resonant 
diffractive grating surface acts as a surface binding platform for specific binding 
substances (such as immobilized capture agents of the instant invention). SWS is an 
unconventional type of diffractive optic that can mimic the effect of thin-film 
coatings. (Peng & Morris, "Resonant scattering from two-dimensional gratings," J. 

25 Opt. Soc. Am. A, Vol. 13, No. 5, p. 993, May; Magnusson, & Wang, "New principle 
for optical filters," Appl. Phys. Lett., 61, No. 9, p. 1022, August, 1992; Peng & 
Morris, "Experimental demonstration of resonant anomalies in diffraction from two- 
dimensional gratings," Optics Letters, Vol. 21, No. 8, p. 549, April, 1996). A SWS 
structure contains a surface-relief, two-dimensional grating in which the grating 

30 period is small compared to the wavelength of incident light so that no diffractive 
orders other than the reflected and transmitted zeroth orders are allowed to 
propagate. A SWS surface narrowband filter can comprise a two-dimensional 
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grating sandwiched between a substrate layer and a cover layer that fills the grating 
grooves. Optionally, a cover layer is not used. When the effective index of refraction 
of the grating region is greater than the substrate or the cover layer, a waveguide is 
created. When a filter is designed accordingly, incident light passes into the 
5 waveguide region. A two-dimensional grating structure selectively couples light at a 
narrow band of wavelengths into the waveguide. The light propagates only a short 
distance (on the order of 10-100 micrometers), undergoes scattering, and couples 
with the forward- and backward-propagating zeroth-order light. This sensitive 
coupling condition can produce a resonant grating effect on the reflected radiation 
10 spectrum, resulting in a narrow band of reflected or transmitted wavelengths 
(colors). The depth and period of the two-dimensional grating are less than the 
wavelength of the resonant grating effect. 

The reflected or transmitted color of this structure can be modulated by the 
addition of molecules such as capture agents or their PET-containing binding 

15 partners or both, to the upper surface of the cover layer or the two-dimensional 
grating surface. The added molecules increase the optical path length of incident 
radiation through the structure, and thus modify the wavelength (color) at which 
maximum reflectance or transmittance will occur. Thus in one embodiment, a 
biosensor, when illuminated with white light, is designed to reflect only a single 

20 wavelength. When specific binding substances are attached to the surface of the 
biosensor, the reflected wavelength (color) is shifted due to the change of the optical 
path of light that is coupled into the grating. By linking specific binding substances 
to a biosensor surface, complementary binding partner molecules can be detected 
without the use of any kind of fluorescent probe or particle label. The detection 

25 technique is capable of resolving changes of, for example, about 0.1 nm thickness of 
protein binding, and can be performed with the biosensor surface either immersed in 
fluid or dried. This PWV change can be detected by a detection system consists of, 
for example, a light source that illuminates a small spot of a biosensor at normal 
incidence through, for example, a fiber optic probe. A spectrometer collects the 

30 reflected light through, for example, a second fiber optic probe also at normal 
incidence. Because no physical contact occurs between the excitation/detection 
system and the biosensor surface, no special coupling prisms are required. The 
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biosensor can, therefore, be adapted to a commonly used assay platform including, 
for example, microtiter plates and microarray slides. A spectrometer reading can be 
performed in several milliseconds, thus it is possible to efficiently measure a large 
number of molecular interactions taking place in parallel upon a biosensor surface, 
5 and to monitor reaction kinetics in real time. 

Various embodiments, variations of the biosensor described above can be 
found in US2003/0032039, incoiporated herein by reference in its entirety. 

One or more specific capture agents may be immobilized on the two- 
dimensional grating or cover layer, if present. Immobilization may occur by any of 

10 the above described methods. Suitable capture agents can be, for example, a nucleic 
acid, polypeptide, antigen, polyclonal antibody, monoclonal antibody, single chain 
antibody (scFv), F(ab) fragment, F(ab')2 fragment, Fv fragment, small organic 
molecule, even cell, virus, or bacteria. A biological sample can be obtained and/or 
deribed from, for example, blood, plasma, serum, gastrointestinal secretions, 

15 homogenates of tissues or tumors, synovial fluid, feces, saliva, sputum, cyst fluid, 
amniotic fluid, cerebrospinal fluid, peritoneal fluid, lung lavage fluid, semen, 
lymphatic fluid, tears, or prostatitc fluid. Preferably, one or more specific capture 
agents are arranged in a microarray of distinct locations on a biosensor. A 
microarray of capture agents comprises one or more specific capture agents on a 

20 surface of a biosensor such that a biosensor surface contains a plurality of distinct 
locations, each with a different capture agent or with a different amount of a specific 
capture agent. For example, an array can comprise 1, 10, 100, 1,000, 10,000, or 
100,000 distinct locations. A biosensor surface with a large number of distinct 
locations is called a microarray because one or more specific capture agents are 

25 typically laid out in a regular grid pattern in x-y coordinates. However, a microarray 
can comprise one or more specific capture agents laid out in a regular or irregular 
pattern. 

A microarray spot can range from about 50 to about 500 microns in 
diameter. Alternatively, a microarray spot can range from about 150 to about 200 
30 microns in diameter. One or more specific capture agents can be bound to their 
specific PET-containing binding partners. 
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In one biosensor embodiment, a microarray on a biosensor is created by 
placing microdroplets of one or more specific capture agents onto, for example, an 
x-y grid of locations on a two-dimensional grating or cover layer surface. When the 
biosensor is exposed to a test sample comprising one or more PET binding partners, 

5 the binding partners will be preferentially attracted to distinct locations on the 
microarray that comprise capture agents that have high affinity for the PET binding 
partners. Some of the distinct locations will gather binding partners onto their 
surface, while other locations will not. Thus a specific capture agent specifically 
binds to its PET binding partner, but does not substantially bind other PET binding 

1 0 partners added to the surface of a biosensor. In an alternative embodiment, a nucleic 
acid microarray (such as an aptamer array) is provided, in which each distinct 
location within the array contains a different aptamer capture agent. By application 
of specific capture agents with a microarray spotter onto a biosensor, specific 
binding substance densities of 10,000 specific binding substances/in 2 can be 

15 obtained. By focusing an illumination beam of a fiber optic probe to interrogate a 
single microarray location, a biosensor can be used as a label-free microarray 
readout system. 

For the detection of PET binding partners at concentrations of less than about 
0.1 ng/ml, one may amplify and transduce binding partners bound to a biosensor 

20 into an additional layer on the biosensor surface. The increased mass deposited on 
the biosensor can be detected as a consequence of increased optical path length. By 
incorporating greater mass onto a biosensor surface, an optical density of binding 
partners on the surface is also increased, thus rendering a greater resonant 
wavelength shift than would occur without the added mass. The addition of mass 

25 can be accomplished, for example, enzymatically, through a "sandwich" assay, or by 
direct application of mass (such as a second capture agent specific for the PET 
peptide) to the biosensor surface in the form of appropriately conjugated beads or 
polymers of various size and composition. Since the capture agents are PET- 
specific, multiple capture agents of different types and specificity can be added 

30 together to the captured PETs. This principle has been exploited for other types of 
optical biosensors to demonstrate sensitivity increases over 1500* beyond 
sensitivity limits achieved without mass amplification. See, e.g., Jenison et al., 
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"Interference-based detection of nucleic acid targets on optically coated silicon," 
Nature Biotechnology, 19: 62-65, 2001. 

In an alternative embodiment, a biosensor comprises volume surface-relief 
volume diffractive structures (a SRVD biosensor). SRVD biosensors have a surface 
5 that reflects predominantly at a particular narrow band of optical wavelengths when 
illuminated with a broad band of optical wavelengths. Where specific capture agents 
and/or PET binding partners are immobilized on a SRVD biosensor, the reflected 
wavelength of light is shifted. One-dimensional surfaces, such as thin film 
interference filters and Bragg reflectors, can select a narrow range of reflected or 
10 transmitted wavelengths from a broadband excitation source. However, the 
deposition of additional material, such as specific capture agents and/or PET binding 
partners onto their upper surface results only in a change in the resonance linewidth, 
rather than the resonance wavelength. In contrast, SRVD biosensors have the ability 
to alter the reflected wavelength with the addition of material, such as specific 
1 5 capture agents and/or binding partners to the surface. 

A SRVD biosensor comprises a sheet material having a first and second 
surface. The first surface of the sheet material defines relief volume diffraction 
structures. Sheet material can comprise, for example, plastic, glass, semiconductor 
wafer, or metal film. A relief volume diffractive structure can be, for example, a 
20 two-dimensional grating, as described above, or a three-dimensional surface-relief 
volume diffractive grating. The depth and period of relief volume diffraction 
structures are less than the resonance wavelength of light reflected from a biosensor. 
A three-dimensional surface-relief volume diffractive grating can be, for example, a 
three-dimensional phase-quantized terraced surface relief pattern whose groove 
25 pattern resembles a stepped pyramid. When such a grating is illuminated by a beam 
of broadband radiation, light will be coherently reflected from the equally spaced 
terraces at a wavelength given by twice the step spacing times the index of refraction 
of the surrounding medium. Light of a given wavelength is resonantly diffracted or 
reflected from the steps that are a half-wavelength apart, and with a bandwidth that 
30 is inversely proportional to the number of steps. The reflected or diffracted color can 
be controlled by the deposition of a dielectric layer so that a new wavelength is 
selected, depending on the index of refraction of the coating. 
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A stepped-phase structure can be produced first in photoresist by coherently 
exposing a thin photoresist film to three laser beams, as described previously. See 
e.g., Cowen, "The recording and large scale replication of crossed holographic 
grating arrays using multiple beam interferometry," in International Conference on 

5 the Application, Theory, and Fabrication of Periodic Structures, Diffraction 
Gratings, and Moire Phenomena II, Lerner, ed., Proc. Soc. Photo-Opt. Instrum. Eng., 
503, 120-129, 1984; Cowen, "Holographic honeycomb microlens," Opt. Eng. 24, 
796-802 (1985); Cowen & Slafer, "The recording and replication of holographic 
micropatterns for the ordering of photographic emulsion grains in film systems," J 

10 Imaging Sci. 31, 100-107, 1987. The nonlinear etching characteristics of photoresist 
are used to develop the exposed film to create a three-dimensional relief pattern. The 
photoresist structure is then replicated using standard embossing procedures. For 
example, a thin silver film may be deposited over the photoresist structure to form a 
conducting layer upon which a thick film of nickel can be electroplated. The nickel 

15 "master" plate is then used to emboss directly into a plastic film, such as vinyl, that 
has been softened by heating or solvent. A theory describing the design and 
fabrication of three-dimensional phase-quantized terraced surface relief pattern that 
resemble stepped pyramids is described: Cowen, "Aztec surface-relief volume 
diffractive structure," J. Opt. Soc. Am. A, 7:1529 (1990). An example of a three- 

20 dimensional phase-quantized terraced surface relief pattern may be a pattern that 
resembles a stepped pyramid. Each inverted pyramid is approximately 1 micron in 
diameter. Preferably, each inverted pyramid can be about 0.5 to about 5 microns 
diameter, including for example, about 1 micron. The pyramid structures can be 
close-packed so that a typical microarray spot with a diameter of 150-200 microns 

25 can incorporate several hundred stepped pyramid structures. The relief volume 
diffraction structures have a period of about 0.1 to about 1 micron and a depth of 
about 0.1 to about 1 micron. 

One or more specific binding substances, as described above, are 
immobilized on the reflective material of a SRVD biosensor. One or more specific 

30 binding substances can be arranged in microarray of distinct locations, as described 
above, on the reflective material. 

A SRVD biosensor reflects light predominantly at a first single optical 
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wavelength when illuminated with a broad band of optical wavelengths, and reflects 
light at a second single optical wavelength when one or more specific binding 
substances are immobilized on the reflective surface. The reflection at the second 
optical wavelength results from optical interference. A SRVD biosensor also reflects 
5 light at a third single optical wavelength when the one or more specific capture 
agents are bound to their respective PET binding partners, due to optical 
interference. Readout of the reflected color can be performed serially by focusing a 
microscope objective onto individual microarray spots and reading the reflected 
spectrum with the aid of a spectrograph or imaging spectrometer, or in parallel by, 
10 for example, projecting the reflected image of the microarray onto an imaging 
spectrometer incorporating a high resolution color CCD camera. 

A SRVD biosensor can be manufactured by, for example, producing a metal 
master plate, and stamping a relief volume diffractive structure into, for example, a 
plastic material like vinyl. After stamping, the surface is made reflective by blanket 
15 deposition of, for example, a thin metal film such as gold, silver, or aluminum. 
Compared to MEMS-based biosensors that rely upon photolithography, etching, and 
wafer bonding procedures, the manufacture of a SRVD biosensor is very 
inexpensive. 

A SWS or SRVD biosensor embodiment can comprise an inner surface. In 
20 one preferred embodiment, such an inner surface is a bottom surface of a liquid- 
containing vessel. A liquid-containing vessel can be, for example, a microtiter plate 
well, a test tube, a petri dish, or a microfluidic channel. In one embodiment, a SWS 
or SRVD biosensor is incorporated into a microtiter plate. For example, a SWS 
biosensor or SRVD biosensor can be incorporated into the bottom surface of a 
25 microtiter plate by assembling the walls of the reaction vessels over the resonant 
reflection surface, so that each reaction "spot" can be exposed to a distinct test 
sample. Therefore, each individual microtiter plate well can act as a separate 
reaction vessel. Separate chemical reactions can, therefore, occur within adjacent 
wells without intermixing reaction fluids and chemically distinct test solutions can 
30 be applied to individual wells. 

This technology is useful in applications where large numbers of 
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biomolecular interactions are measured in parallel, particularly when molecular 
labels would alter or inhibit the functionality of the molecules under study. High- 
throughput screening of pharmaceutical compound libraries with protein targets, and 
microarray screening of protein-protein interactions for proteomics are examples of 

5 applications that require the sensitivity and throughput afforded by the compositions 
and methods of the invention. 

Unlike surface plasmon resonance, resonant mirrors, and waveguide 
biosensors, the described compositions and methods enable many thousands of 
individual binding reactions to take place simultaneously upon the biosensor surface. 

10 This technology is useful in applications where large numbers of biomolecular 
interactions are measured in parallel (such as in an array), particularly when 
molecular labels alter or inhibit the functionality of the molecules under study. 
These biosensors are especially suited for high-throughput screening of 
pharmaceutical compound libraries with protein targets, and microarray screening of 

15 protein-protein interactions for proteomics. A biosensor of the invention can be 
manufactured, for example, in large areas using a plastic embossing process, and 
thus can be inexpensively incorporated into common disposable laboratory assay 
platforms such as microtiter plates and microarray slides. 

Other similar biosensors may also be used in the instant invention. Numerous 

20 biosensors have been developed to detect a variety of biomolecular complexes 
including oligonucleotides, antibody-antigen interactions, hormone-receptor 
interactions, and enzyme-substrate interactions. In general, these biosensors consist 
of two components: a highly specific recognition element and a transducer that 
converts the molecular recognition event into a quantifiable signal. Signal 

25 transduction has been accomplished by many methods, including fluorescence, 
interferometry (Jenison et al., "Interference-based detection of nucleic acid targets 
on optically coated silicon," Nature Biotechnology, 19, p. 62-65; Lin et al., "A 
porous silicon-based optical interferometric biosensor," Science, 278, p. 840-843, 
1997), and gravimetry (A. Cunningham, Bioanalytical Sensors, John Wiley & Sons 

30 (1998)). Of the optically-based transduction methods, direct methods that do not 
require labeling of analytes with fluorescent compounds are of interest due to the 
relative assay simplicity and ability to study the interaction of small molecules and 
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proteins that are not readily labeled. 

These direct optical methods include surface plasmon resonance (SPR) 
(Jordan & Corn, "Surface Plasmon Resonance Imaging Measurements of 
Electrostatic Biopolymer Adsorption onto Chemically Modified Gold Surfaces," 
5 Anal. Chem., 69:1449-1456 (1997); plasmom-resonant particles (PRPs) (Schultz et 
al, Proc. Nat Acad. Sci., 97: 996-1001 (2000); grating couplers (Morhard et al., 
"Immobilization of antibodies in micropattems for cell detection by optical 
diffraction," Sensors and Actuators B, 70, p. 232-242, 2000); ellipsometry (Jin et al., 
"A biosensor concept based on imaging ellipsometry for visualization of 

10 biomolecular interactions," Analytical Biochemistry, 232, p. 69-72, 1995), 
evanascent wave devices (Huber et al., "Direct optical immunosensing (sensitivity 
and selectivity)," Sensors and Actuators B, 6, p. 122. 126, 1992), resonance light 
scattering (Bao et al, Anal Chem., 74:1792-1797 (2002), and reflectometry (Brecht 
& Gauglitz, "Optical probes and transducers," Biosensors and Bioelectronics, 10, p. 

15 923-936, 1995). Changes in the optical phenomenon of surface plasmon resonance 
(SPR) can be used as an indication of real-time reactions between biological 
molecules. Theoretically predicted detection limits of these detection methods have 
been determined and experimentally confirmed to be feasible down to diagnostically 
relevant concentration ranges. 

20 Surface plasmon resonance (SPR) has been successfully incorporated into an 

immunosensor format for the simple, rapid, and nonlabeled assay of various 
biochemical analytes. Proteins, complex conjugates, toxins, allergens, drugs, and 
pesticides can be determined directly using either natural antibodies or synthetic 
receptors with high sensitivity and selectivity as the sensing element. 

25 Immunosensors are capable of real-time monitoring of the antigen-antibody 
reaction. A wide range of molecules can be detected with lower limits ranging 
between 10" 9 and 10" u mol/L. Several successful commercial developments of SPR 
immunosensors are available and their web pages are rich in technical information. 
Wayne et al (Methods 22: 77-91, 2000) reviewed and highlighted many recent 

30 developments in SPR-based immunoassay, functionalizations of the gold surface, 
novel receptors in molecular recognition, and advanced techniques for sensitivity 
enhancement. 
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Utilization of the optical phenomenon surface plasmon resonance (SPR) has 
seen extensive growth since its initial observation by Wood in 1902 (Phil. Mag. 4 
(1902), pp. 396-402). SPR is a simple and direct sensing technique that can be used 
to probe refractive index (r\) changes that occur in the very close vicinity of a thin 

5 metal film surface (Otto Z. Phys. 216 (1968), p. 398). The sensing mechanism 
exploits the properties of an evanescent field generated at the site of total internal 
reflection. This field penetrates into the metal film, with exponentially decreasing 
amplitude from the glass-metal interface. Surface plasmons, which oscillate and 
propagate along the upper surface of the metal film, absorb some of the plane- 

10 polarized light energy from this evanescent field to change the total internal 
reflection light intensity I r . A plot of I r versus incidence (or reflection) angle 0 
produces an angular intensity profile that exhibits a sharp dip. The exact location of 
the dip minimum (or the SPR angle 0 r ) can be determined by using a polynomial 
algorithm to fit the I r signals from a few diodes close to the minimum. The binding 

1 5 of molecules on the upper metal surface causes a change in t\ of the surface medium 
that can be observed as a shift in G r . 

The potential of SPR for biosensor purposeswas realized in 1982-1983 by 
Liedberg et al., who adsorbed an immunoglobulin G (IgG) antibody overlayer on the 
gold sensing film, resulting in the subsequent selective binding and detection of IgG 

20 (Nylander et al., Sens. Actuators 3 (1982), pp. 79-84; Liedberg et al., Sens. 
Actuators 4 (1983), pp. 229-304). The principles of SPR as a biosensing technique 
have been reviewed previously (Daniels et al., Sens. Actuators 15 (1988), pp. 1 1-18; 
VanderNoot and Lai, Spectroscopy 6 (1991), pp. 28-33; Lundstrom Biosens. 
Bioelectron. 9 (1994), pp. 725-736; Liedberg et al, Biosens. Bioelectron. 10 (1995); 

25 Morgan et al, Clin. Chem. Al (1996), pp. 193-209; Tapuchi et al., S. Afr. J. Chem. 
49 (1996), pp. 8-25). Applications of SPR to biosensing were demonstrated for a 
wide range of molecules, from vims particles to sex hormone-binding globulin and 
syphilis. Most importantly, SPR has an inherent advantage over other types of 
biosensors in its versatility and capability of monitoring binding interactions without 

30 the need for fluorescence or radioisotope labeling of the biomolecules. This 
approach has also shown promise in the real-time determination of concentration, 
kinetic constant, and binding specificity of individual biomolecular interaction steps. 
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Antibody-antigen interactions, peptide/protein^protein interactions, DNA 
hybridization conditions, biocompatibility studies of polymers, biomolecule-cell 
receptor interactions, and DNA/receptor-ligand interactions can all be analyzed 
(Pathak and Savelkoul, Immunol. Today 18 (1997), pp. 464-467). Commercially, the 

5 use of SPR-based immunoassay has been promoted by companies such as Biacore 
(Uppsala, Sweden) (Jonsson et aL, Ann. Biol. Clin. 51 (1993), pp. 19-26), Windsor 
Scientific (U.K.) (WWW URL for Windsor Scientific IBIS Biosensor), Quantech 
(Minnesota) (WWW URL for Quantech), and Texas Instruments (Dallas, TX) 
(WWW URL for Texas Instruments). 

10 In yet another embodiment, a fluorescent polymer superquenching-based 

bioassays as disclosed in WO 02/074997 may be used for detecting binding of the 
unlabeled PET to its capture agents. In this embodiment, a capture agent that is 
specific for both a target PET peptide and a chemical moiety is used. The chemical 
moiety includes (a) a recognition element for the capture agent, (b) a fluorescent 

15 property-altering element, and (c) a tethering element linking the recognition 
element and the property-altering element. A composition comprising a fluorescent 
polymer and the capture agent are co-located on a support. When the chemical 
moiety is bound to the capture agent, the property-altering element of the chemical 
moiety is sufficiently close to the fluorescent polymer to alter (quench) the 

20 fluorescence emitted by the polymer. When an analyte sample is introduced, the 
target PET peptide, if present, binds to the capture agent, thereby displacing the 
chemical moiety from the receptor, resulting in de-quenching and an increase of 
detected fluorescence. Assays for detecting the presence of a target biological agent 
are also disclosed in the application. 

25 In another related embodiment, the binding event between the capture agents 

and the PET can be detected by using a water-soluble luminescent quantum dot as 
described in US2003/0008414A1. In one embodiment, a water-soluble luminescent 
semiconductor quantum dot comprises a core, a cap and a hydrophilic attachment 
group. The "core" is a nanoparticle-sized semiconductor. While any core of the IIB- 

30 VIB, IIIB-VB or IVB-IVB semiconductors can be used in this context, the core must 
be such that, upon combination with a cap, a luminescent quantum dot results. A 
IIB-VIB semiconductor is a compound that contains at least one element from 
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Group IEB and at least one element from Group VIB of the periodic table, and so 
on. Preferably, the core is a IIB-VIB, IIIB-VB or IVB-IVB semiconductor that 
ranges in size from about 1 nm to about 10 nm. The core is more preferably a IIB- 
VIB semiconductor and ranges in size from about 2 nm to about 5 nm. Most 
5 preferably, the core is CdS or CdSe. In this regard, CdSe is especially preferred as 
the core, in particular at a size of about 4.2 nm. 

The "cap" is a semiconductor that differs from the semiconductor of the core 
and binds to the core, thereby forming a surface layer on the core. The cap must be 
such that, upon combination with a given semiconductor core, results in a 
10 luminescent quantum dot. The cap should passivate the core by having a higher band 
gap than the core. In this regard, the cap is preferably a IIB-VIB semiconductor of 
high band gap. More preferably, the cap is ZnS or CdS. Most preferably, the cap is 
ZnS. In particular, the cap is preferably ZnS when the core is CdSe or CdS and the 
cap is preferably CdS when the core is CdSe. 

15 The "attachment group" as that term is used herein refers to any organic 

group that can be attached, such as by any stable physical or chemical association, to 
the surface of the cap of the luminescent semiconductor quantum dot and can render 
the quantum dot water-soluble without rendering the quantum dot no longer 
luminescent. Accordingly, the attachment group comprises a hydrophilic moiety. 

20 Preferably, the attachment group enables the hydrophilic quantum dot to remain in 
solution for at least about one hour, one day, one week, or one month. Desirably, the 
attachment group is attached to the cap by covalent bonding and is attached to the 
cap in such a manner that the hydrophilic moiety is exposed. Preferably, the 
hydrophilic attachment group is attached to the quantum dot via a sulfur atom. More 

25 preferably, the hydrophilic attachment group is an organic group comprising a sulfur 
atom and at least one hydrophilic attachment group. Suitable hydrophilic attachment 
groups include, for example, a carboxylic acid or salt thereof, a sulfonic acid or salt 
thereof, a sulfamic acid or salt thereof, an amino substituent, a quaternary 
ammonium salt, and a hydroxy. The organic group of the hydrophilic attachment 

30 group of the present invention is preferably a C1-C6 alkyl group or an aryl group, 
more preferably a C1-C6 alkyl group, even more preferably a C1-C3 alkyl group. 
Therefore, in a preferred embodiment, the attachment group of the present invention 
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is a thiol carboxylic acid or thiol alcohol. More preferably, the attachment group is a 
thiol carboxylic acid. Most preferably, the attachment group is mercaptoacetic acid. 

Accordingly, a preferred embodiment of a water-soluble luminescent 
semiconductor quantum dot is one that comprises a CdSe core of about 4.2 nm in 

5 size, a ZnS cap and an attachment group. Another preferred embodiment of a 
watersoluble luminescent semiconductor quantum dot is one that comprises a CdSe 
core, a ZnS cap and the attachment group mercaptoacetic acid. An especially 
preferred water-soluble luminescent semiconductor quantum dot comprises a CdSe 
core of about 4.2 nm, a ZnS cap of about 1 nm and a mercaptoacetic acid attachment 

10 group. 

The capture agent of the instant invention can be attached to the quantum dot 
via the hydrophilic attachment group and forms a conjugate. The capture agent can 
be attached, such as by any stable physical or chemical association, to the 
hydrophilic attachment group of the water-soluble luminescent quantum dot directly 

15 or indirectly by any suitable means, through one or more covalent bonds, via an 
optional linker that does not impair the function of the capture agent or the quantum 
dot. For example, if the attachment group is mercaptoacetic acid and a nucleic acid 
biomolecule is being attached to the attachment group, the linker preferably is a 
primary amine, a thiol, streptavidin, neutravidin, biotin, or a like molecule. If the 

20 attachment group is mercaptoacetic acid and a protein biomolecule or a fragment 
thereof is being attached to the attachment group, the linker preferably is 
strepavidin, neutravidin, biotin, or a like molecule. 

By using the quantum dot-capture agent conjugate, a PET-containing sample, 
when contacted with a conjugate as described above, will promote the emission of 

25 luminescence when the capture agent of the conjugate specifically binds to the PET 
peptide. This is particularly useful when the capture agent is a nucleic acid aptamer 
or an antibody. When the aptamer is used, an alternative embodiment may be 
employed, in which a fluorescent quencher may be positioned adjacent to the 
quantum dot via a self-pairing stem-loop structure when the aptamer is not bound to 

30 a PET-containing sequence. When the aptamer binds to the PET, the stem-loop 
structure is opened, thus releasing the quenching effect and generates luminescence. 
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In another related embodiment, arrays of nanosensors comprising nanowires 
or nanotubes as described in US2002/01 17659A1 may be used for detection and/or 
quantitation of PET-capture agent interaction. Briefly, a "nanowire" is an elongated 
nanoscale semiconductor, which can have a cross-sectional dimension of as thin as 1 

5 nanometer. Similarly, a "nanotube" is a nanowire that has a hollowed-out core, and 
includes those nanotubes know to those of ordinary skill in the art. A "wire" refers 
to any material having a conductivity at least that of a semiconductor or metal. 
These nanowires / nanotubes may be used in a system constructed and arranged to 
determine an analyte (e.g., PET peptide) in a sample to which the nanowire(s) is 

10 exposed. The surface of the nanowire is functionalized by coating with a capture 
agent. Binding of an analyte to the functionalized nanowire causes a detectable 
change in electrical conductivity of the nanowire or optical properties. Thus, 
presence of the analyte can be determined by determining a change in a 
characteristic in the nanowire, typically an electrical characteristic or an optical 

15 characteristic. A variety of biomolecular entities can be used for coating, including, 
but not limited to, amino acids, proteins, sugars, DNA, antibodies, antigens, and 
enzymes, etc. For more details such as construction of nanowires, functionalization 
with various biomolecules (such as the capture agents of the instant invention), and 
detection in nanowire devices, see US2002/0117659A1 (incorporated by reference). 

20 Since multiple nanowires can be used in parelle, each with a different capture agent 
as the functionalized group, this technology is ideally suited for large scale arrayed 
detection of PET-containing peptides in biological samples without the need to label 
the PET peptides. This nanowire detection technology has been successfully used to 
detect pH change (H* binding), biotin-streptavidin binding, antibody-antigen 

25 binding, metal (Ca 2+ ) binding with picomolar sensitivity and in real time (Cui et a/., 
Science 293: 1289-1292). 

Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry 
(MALDI-TOF MS), uses a laser pulse to desorb proteins from the surface followed 
by mass spectrometry to identify the molecular weights of the proteins (Gilligan et 

30 al, Mass spectrometry after capture and small-volume elution of analyte from a 
surface plasmon resonance biosensor. Anal Chem. 74 (2002), pp. 2041-2047). 
Because this method only measures the mass of proteins at the interface, and 
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because the desorption protocol is sufficiently mild that it does not result in 
fragmentation, MALDI can provide straightforward useful information such as 
confirming the identity of the bound PET peptide, or any enzymatic modification of 
a PET peptide. For this matter, MALDI can be used to identify proteins that are 
5 bound to immobilized capture agents. An important technique for identifying bound 
proteins relies on treating the array (and the proteins that are selectively bound to the 
array) with proteases and then analyzing the resulting peptides to obtain sequence 
data. 

10 IV. Samples and Their Preparation 

The capture agents or an array of capture agents typically are contacted with 
a sample, e.g., a biological fluid, a water sample, or a food sample, which has been 
fragmented to generate a collection of peptides, under conditions suitable for 
binding a PET corresponding to a protein of interest. 

15 Samples to be assayed using the capture agents of the present invention may 

be drawn from various physiological, environmental or artificial sources. In 
particular, physiological samples such as body fluids or tissue samples of a patient 
or an organism may be used as assay samples. Such fluids include, but are not 
limited to, saliva, mucous, sweat, whole blood, serum, urine, amniotic fluid, genital 

20 fluids, fecal material, marrow, plasma, spinal fluid, pericardial fluids, gastric fluids, 
abdominal fluids, peritoneal fluids, pleural fluids and extraction from other body 
parts, and secretion from other glands. Alternatively, biological samples drawn from 
cells taken from the patient or grown in culture may be employed. Such samples 
include supernatants, whole cell lysates, or cell fractions obtained by lysis and 

25 fractionation of cellular material. Extracts of cells and fractions thereof, including 
those directly from a biological entity and those grown in an artificial environment, 
can also be used. In addition, a biological sample can be obtained and/or deribed 
from, for example, blood, plasma, serum, gastrointestinal secretions, homogenates of 
tissues or tumors, synovial fluid, feces, saliva, sputum, cyst fluid, amniotic fluid, 

30 cerebrospinal fluid, peritoneal fluid, lung lavage fluid, semen, lymphatic fluid, tears, 
or prostatitc fluid. 
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A general scheme of sample preparation prior to its use in the methods of the 
instant invention is described in Figure 6 (slide 45 of D2). Briefly, a sample can be 
pretreated by extraction and/or dilution to minimize the interference from certain 
substances present in the sample. The sample can then be either chemically reduced, 

5 denatured, alkylated, or subjected to thermo-denaturation. Regardless of the 
denaturation step, the denatured sample is then digested by a protease, such as 
trypsin, before it is used in subsequent assays. A desalting step may also be added 
just after protease digestion if chemical denaturation if used. This process is 
generally simple, robust and reproducible, and is generally applicable to main 

1 0 sample types including serum, cell lysates and tissues. 

The sample may be pre treated to remove extraneous materials, stabilized, 
buffered, preserved, filtered, or otherwise conditioned as desired or necessary. 
Proteins in the sample typically are fragmented, either as part of the methods of the 
invention or in advance of performing these methods. Fragmentation can be 

15 performed using any art-recognized desired method, such as by using chemical 
cleavage (e.g., cyanogen bromide); enzymatic means (e.g., using a protease such as 
trypsin, chymotrypsin, pepsin, papain, carboxypeptidase, calpain, subtilisin, gluc-C, 
endo lys-C and proteinase K, or a collection or sub-collection thereof); or physical 
means (e.g., fragmentation by physical shearing or fragmentation by sonication). As 

20 used herein, the terms "fragmentation" "cleavage," "proteolytic cleavage," 
"proteolysis" "restriction" and the like are used interchangeably and refer to scission 
of a chemical bond, typically a peptide bond, within proteins to produce a collection 
of peptides (i.e., protein fragments). 

The purpose of the fragmentation is to generate peptides comprising PET 

25 which are soluble and available for binding with a capture agent. In essence, the 
sample preparation is designed to assure to the extent possible that all PET present 
on or within relevant proteins that may be present in the sample are available for 
reaction with the capture agents. This strategy can avoid many of the problems 
encountered with previous attempts to design protein chips caused by protein- 

30 protein complexation, post translational modifications and the like. 

In one embodiment, the sample of interest is treated using a pre-determined 
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protocol which: (A) inhibits masking of the target protein caused by target protein- 
protein non covalent or covalent complexation or aggregation, target protein 
degradation or denaturing, target protein post-translational modification, or 
environmentally induced alteration in target protein tertiary structure, and (B) 

5 fragments the target protein to, thereby, produce at least one peptide epitope (i.e., a 
PET) whose concentration is directly proportional to the true concentration of the 
target protein in the sample. The sample treatment protocol is designed and 
empirically tested to result reproducibly in the generation of a PET that is available 
for reaction with a given capture agent. The treatment can involve protein 

10 separations; protein fractionations; solvent modifications such as polarity changes, 
osmolality changes, dilutions, or pH changes; heating; freezing; precipitating; 
extractions; reactions with a reagent such as an endo-, exo- or site specific protease; 
non proteolytic digestion; oxidations; reductions; neutralization of some biological 
activity, and other steps known to one of skill in the art. 

15 For example, the sample may be treated with an alkylating agent and a 

reducing agent in order to prevent the formation of dimers or other aggregates 
through disulfide/dithiol exchange. The sample of PET-containing peptides may also 
be treated to remove secondary modifications, including but are not limited to, 
phosphorylation, methylation, glycosylation, acetylation, prenylation, using, for 

20 example, respective modification-specific enzymes such as phosphatases, etc. 

In one embodiment, proteins of a sample will be denatured, reduced and/or 
alkylated, but will not be proteolytically cleaved. Proteins can be denatured by 
thermal denaturation or organic solvents, then subjected to direct detection or 
optionally, further proteolytic cleavage. 

25 The use of thermal denaturation (50-90 °C for about 20 minutes) of proteins 

prior to enzyme digestion in solution is preferred over chemical denaturation (such 
as 6-8 M guanidine HC1 or urea) because it does not require purification / 
concentration, which might be preferred or required prior to subsequent analysis. 
Park and Russell reported that enzymatic digestions of proteins that are resistant to 

30 proteolysis are significantly enhanced by thermal denaturation (Anal. Chem., 72 
(11): 2667 -2670, 2000). Native proteins that are sensitive to proteolysis show 
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similar or just slightly lower digestion yields following thermal denaturation. 
Proteins that are resistant to digestion become more susceptible to digestion, 
independent of protein size, following thermal denaturation. For example, amino 
acid sequence coverage from digest fragments increases from 15 to 86% in 
5 myoglobin and from 0 to 43% in ovalbumin. This leads to more rapid and reliable 
protein identification by the instant invention, especially to protease resistant 
proteins. 

Although some proteins aggregate upon thermal denaturation, the protein 
aggregates are easily digested by trypsin and generate sufficient numbers of digest 

10 fragments for protein identification. In fact, protein aggregation may be the reason 
thermal denaturation facilitates digestion in most cases. Protein aggregates are 
believed to be the oligomerization products of the denatured form of protein 
(Copeland, R. A. Methods for Protein Analysis; Chapman & Hall: New York, NY, 
1994). In general, hydrophobic parts of the protein are located inside and relatively 

15 less hydrophobic parts of the protein are exposed to the aqueous environment. 
During the thermal denaturation, intact proteins are gradually unfolded into a 
denatured conformation and sufficient energy is provided to prevent a fold back to 
its native conformation. The probability for interactions with other denatured 
proteins is increased, thus allowing hydrophobic interactions between exposed 

20 hydrophobic parts of the proteins. In addition, protein aggregates of the denatured 
protein can have a more protease-labile structure than nondenatured proteins 
because more cleavage sites are exposed to the environment. Protein aggregates are 
easily digested, so that protein aggregates are not observed at the end of 3 h of 
trypsin digestion (Park and Russell, Anal. Chem., 72 (11): 2667 -2670, 2000). 

25 Moreover, trypsin digestion of protein aggregates generates more specific cleavage 
products. 

Ordinary proteases such as trypsin may be used after denaturation. The 
process may be repeated by one or more rounds after the first round of denaturation 
and digestion. Alternatively, this thermal denaturation process can be further 
30 assisted by using thermophilic trypsin-like enzymes, so that denaturation and 
digestion can be done simultaneously. For example, Nongpom Towatana et al. {J of 
Bioscience and Bioengineering 87(5): 581-587, 1999) reported the purification to 
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apparent homogeneity of an alkaline protease from culture supernatants of Bacillus 
sp. PS719, a novel alkaliphilic, thermophilic bacterium isolated from a thermal 
spring soil sample. The protease exhibited maximum activity towards azocasein at 
pH 9.0 and at 75°C. The enzyme was stable in the pH range 8.0 to 10.0 and up to 

5 80°C in the absence of Ca 2+ . This enzyme appears to be a trypsin-like serine 
protease, since phenylmethylsulfonyl fluoride (PMSF) and 3,4-dichloroisocoumarin 
(DCI) in addition to N-ct-p-tosyl-L-lysine chloromethyl ketone (TLCK) completely 
inhibited the activity. Among the various oligopeptidyl-p-nitroanilides tested, the 
protease showed a preference for cleavage at arginine residues on the carboxylic 

10 side of the scissile bond of the substrate, liberating p-nitroaniline from N- 
carbobenzoxy (CBZ)-L-arginine-p-nitroanilide with the K™ and V max values of 0.6 
mM and 1 .0 ^mol min^mg protein \ respectively. 

Alternatively, existing proteases may be chemically modified to achieve 
enhanced thermostability for use in this type of application. Mozhaev et al. (Eur J 

15 Biochem. 173(1): 147-54, 1988) experimentally verified the idea presented earlier 
that the contact of nonpolar clusters located on the surface of protein molecules with 
water destabilizes proteins. It was demonstrated that protein stabilization could be 
achieved by artificial hydrophilization of the surface area of protein globules by 
chemical modification. Two experimental systems were studied for the verification 

20 of the hydrophilization approach. In one experiment, the surface tyrosine residues of 
trypsin were transformed to aminotyrosines using a two-step modification 
procedure: nitration by tetranitromethane followed by reduction with sodium 
dithionite. The modified enzyme was much more stable against irreversible thermo- 
inactivation: the stabilizing effect increased with the number of aminotyrosine 

25 residues in trypsin and the modified enzyme could become even 100 times more 
stable than the native one. In another experiment, alpha-chymotrypsin was 
covalently modified by treatment with anhydrides or chloroanhydrides of aromatic 
carboxylic acids. As a result, different numbers of additional carboxylic groups (up 
to five depending on the structure of the modifying reagent) were introduced into 

30 each Lys residue modified. Acylation of all available amino groups of alpha- 
chymotrypsin by cyclic anhydrides of pyromellitic and mellitic acids resulted in a 
substantial hydrophilization of the protein as estimated by partitioning in an aqueous 
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Ficoll-400/Dextran-70 biphasic system. These modified enzyme preparations were 
extremely stable against irreversible thermal inactivation at elevated temperatures 
(65-98°C); their thermostability was practically equal to the stability of proteolytic 
enzymes from extremely thermophilic bacteria, the most stable proteinases known to 
5 date. Similar approaches may be used to any other chosen proteases for the subject 
method. 

In other embodiments, samples can be pre-treated with reducing agents such 
as b-mercaptoethanol or DTT to reduce the disulfide bonds to facilitate digestion. 

Fractionation may be performed using any single or multidimentional 
10 chromatography, such as reversed phase chromatography (RPC), ion exchange 
chromatography, hydrophobic interaction chromatography, size exclusion 
chromatography, or affinity fractionation such as immunoaffinity and immobilized 
metal affinity chromatography. Preferably, the fractionation involves surface- 
mediated selection strategies. Electrophoresis, either slab gel or capillary 
15 electrophoresis, can also be used to fractionate the peptides in the sample. Examples 
of slab gel electrophoretic methods include sodium dodecyl sulfate polyacrylamide 
gel electrophoresis (SDS-PAGE) and native gel electrophoresis. Capillary 
electrophoresis methods that can be used for fractionation include capillary gel 
electrophoresis (CGE), capillary zone electrophoresis (CZE) and capillary 
20 electrochromatography (CEC), capillary isoelectric focusing, immobilized metal 
affinity chromatography and affinity electrophoresis. 

Protein precipitation may be performed using techniques well known in the 
art. For example, precipitation may be achieved using known precipitants, such as 
potassium thiocyanate, trichloroacetic acid and ammonium sulphate. 

25 Subsequent to fragmentation, the sample may be contacted with the capture 

agents of the present invention, e.g., capture agents immobilized on a planar support 
or on a bead, as described herein. Alternatively, the fragmented sample (containing a 
collection of peptides) may be fractionated based on, for example, size, post- 
translational modifications (e.g., glycosylation or phosphorylation) or antigenic 

30 properties, and then contacted with the capture agents of the present invention, e.g., 
capture agents immobilized on a planar support or on a bead. 
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Figure 7 provides an illustrative example of serum sample pre-treatment 
using either the thermo-denaturation or the chemical denaturation. Briefly, for 
thermo-denaturation, 100 of human serum (about 75 mg/mL total protein) is first 
diluted 10-fold to about 7.5 mg/mL. The diluted sample is then heated to 90°C for 5 

5 minutes to denature the proteins, followed by 30 minutes of trypsin digestion at 
55°C. The trypsin is inactivated at 80°C after the digestion. 

For chemical denaturation, about 1.8 mL of human serum proteins diluted to 
about 4 mg/mL is denatured in a final concentration of 50mM HEPES buffer (pH 
8.0), 8M urea and lOmM DTT. Iodoacetamide is then added to 25mM final 

10 concentration. The denatured sample is then fiirther diluted to about 1 mg/mL for 
protease digestion. The digested sample will pass through a desalting column before 
being used in subsequent assays. 

Figure 8 shows the result of thermo-denaturation and chemical denaturation 
of serum proteins, cell lysates (MOLT4 and Hela cells). It is evident that 

15 denaturation was successful for the majority, if not all of the proteins in both the 
thermo- and chemical-denaturation lanes, and both methods achieved comparable 
results in terms of protein denaturation and fragmentation. 

The above example is for illustrative purpose only and is by no means 
limiting. Minor alterations of the protocol depending on specific uses can be easily 
20 achieved for optimal results in individual assays. 


V. Selection of PET 

One advantages of the PET of the instant invention is that PET can be 

25 determined in sillico and generated in vitro (such as by peptide synthesis) without 
cloning or purifying the protein it belongs. PET is also advantageous over the full- 
length tryptic fragments (or for that matter, any other fragments that predictably 
results from any other treatments) since full-length tryptic fragments tend to contain 
one or more PETs themselves, though the tryptic fragment itself may be unique 

30 simply because of its length (the longer a stretch of peptide, the more likely it will be 
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unique). A direct implication is that, by using relatively short and unique PETs 
rather than the full-length (tryptic) peptide fragments, the method of the instant 
invention has greatly reduced, if not completely eliminated, the risk of having 
multiple antibodies with unique specificities against the same peptide fragment - a 

5 source of antibody cross-reactivity. An additional advantage may be added due to 
the PET selection process, such as the nearest-neighbor analysis and ranking 
prioritization(see below), which further eliminates the chance of cross-reactivity. All 
these features make the PET-based methods particularly suitable for genome-wise 
analysis using multiplexing techniques. 

10 The PET of the instant invention can be selected in various ways. In the 

simplest embodiment, the PET for a given organism or biological sample can be 
generated or identified by a brute force search of the relevant database, using all 
theoretically possible PET with a given length. This process is preferably carried out 
computationaly using, for example, any of the sequence search tools available in the 

15 art or variations thereof. For example, to identify PET of 5 amino acids in length (a 
total of 3.2 million possible PET candidates, see table 2.2.2 below), each of the 3.2 
million candidates may be used as a query sequence to search against the human 
proteom as described below. Any candidate that has more than one hit (found in two 
or more proteins) is immediately eliminated before further searching is done. At the 

20 end of the search, a list of human proteins that have one or more PETs can be 
obtained (see Example 1 below). The same or similar procedure can be used for any 
pre-determined organism or database. 

For example, PETs for each human protein can be identified using the 
following procedure. A Perl program is developed to calculate the occurrence of all 

25 possible peptides, given by 20 N , of defined length N (amino acids) in human 
proteins. For example, the total tag space is 160,000 (20 4 ) for tetramer peptides, 3.2 
M (20 5 ) for pentamer peptides, and 64 M (20 6 ) for hexamer peptides, so on. 
Predicted human protein sequences are analyzed for the presence or absence of all 
possible peptides of N amino acids. PET are the peptide sequences that occur only 

30 once in the human proteome. Thus the presence of a specific PET is an intrinsic 
property of the protein sequence and is operational independent. According to this 
approach, a definitive set of PETs can be defined and used regardless of the sample 
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processing procedure (operational independence). 

In one embodiment, to speed up the searching process, computer algorithms 
may be developed or modified to eliminate unnecessary searches before the actual 
search begins. 

5 Using the example above, two highly related (say differ only in a few amino 

acid positions) human proteins may be aligned, and a large number of candidate 
PET can be eliminated based on the sequence of the identical regions. For example, 
if there is a stretch of identical sequence of 20 amino acids, then sixteen 5-amino 
acid PETs can be eliminated without searching, by virtue of their simultaneous 

10 appearance in two non-identical human proteins. This elimination process can be 
continued using as many highly related protein pairs or families as possible, such as 
the evolutionary conserved proteins such as histones, globins, etc. 

In another embodiment, the identified PET for a given protein may be rank- 
ordered based on certain criteria, so that higher ranking PETs are preferred to be 

1 5 used in generating specific capture agents. 

For example, certain PET may naturally exist on protein surface, thus 
making good candidates for being a soluble peptide when digested by a protease. On 
the other hand, certain PET may exist in an internal or core region of a protein, and 
may not be readily soluble even after digestion. Such solubility property may be 

20 evaluated by available softwares. The solvent accessibility method described in 
Boger, J., Emini, E.A. & Schmidt, A., Surface probability profile-An heuristic 
approach to the selection of synthetic peptide antigens, Reports on the Sixth 
International Congress in Immunology (Toronto) 1986 p.250 also may be used to 
identify PETs that are located on the surface of the protein of interest. The package 

25 MOLMOL (Koradi, R. et al. (1996) J. Mol Graph. 14:51-55) and Eisenhaber's 
ASC method (Eisenhaber and Argos (1993) J. CompuL Chem. 14:1272-1280; 
Eisenhaber et al. (1995j J. CompuL Chem. 16:273-284) may also be used. Surface 
PETs generally have higher ranking than internal PETs. In one embodiment, the 
logP or logD values that can be calculated for a PET, or proteolytic fragment 

30 containing a PET, can be calculated and used to rank order the PET's based on 
likely solubility under conditions that a protein sample is to be contacted with a 
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capture agent. 

Regardless of the manner the PETs are generated, an ideal PET preferably is 
8 amino acids in length, and the parental tryptic peptide should be smaller than 20 
amino acid long. This is because antibodies typically recognize peptide epitopes of 4 

5 - 8 amino acids, thus peptides of 12-20 amino acids are conventionally used for 
antibody production. 

Since trypsin is a preferred digestion enzyme in certain embodiments, a PET 
in these embodiments should not contain K or R in the middle of the sequence so 
that the PET will not be cleaved by trypsin during sample preparation. In a more 

10 general sense, the selected PET should not contain or overlap a digestion site such 
that the PET is expected to be destroyed after digestion, unless an assay specifically 
prefer that a PET be destroyed after digestion. 

In addition, an ideal PET preferably does not have hydrophobic parental 
tryptic peptide, is highly antigenic, and has the smallest numbers (preferably none) 

15 of closest related peptides (nearest neighbor peptides or NNP) defined by nearest 
neighbor analysis. 

Any PET may also be associated with an annotation, which may contain 
useful information such as: whether the PET may be destroyed by a certain protease 
(such as trypsin), whether it is likely to appear on a digested peptide with a relatively 

20 rigid or flexible structure, etc. These characteristics may help to rank order the PETs 
for use if generating specific capture agents, especially when there are a large 
number of PETs associated with a given protein. Since PET may change depending 
on particular use in a given organism, ranking order may change depending on 
specific usages. A PET may be low ranking due to its probability of being destroyed 

25 by a certain protease may rank higher in a different fragmentation scheme using a 
different protease. 

In another embodiment, the computational algorithm for selecting optimal 
PET from a protein for antibody generation takes antibody-peptide interaction data 
into consideration. A process such as Nearest-Neighbor Analysis (NNA), can be 
30 used to select most unique PET for each protein. Each PET in a protein is given a 
relative score, or PET Uniqueness Index, that is based on the number of nearest 
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10 


neighbors it has. The higher the PET Uniqueness Index, the more unique the PET is. 
The PET Uniqueness Index can be calculated using an Amino Acid Replacement 
Matrix such as the one in Table VIII of Getzoff, ED, Tainer JA and Lerner RA. The 
chemistry and meachnism of antibody binding to protein antigens. 1988. Advances. 
Immunol. 43: 1-97. In this matrix, the replaceability of each amino acid by the 
remaining 19 amino acids was calculated based on experimental data on antibody 
cross-reactivity to a large number of peptides of single mutations (replacing each 
amino acid in a peptide sequence by the remaining 19 amino acids). For example, 
each octamer PET from a protein is compared to 8.7 million octamers present in 
human proteome and a PET Uniqueness Index is calculated. This process not only 
selects the most unique PET for particular protein, it also identifies Nearest 
Neighbor Peptides for this PET. This becomes important for defining cross- 
reactivity of PET-specific antibodies since Nearest Neighbor Peptides are the ones 
most likely will cross-react with particular antibody. 
1 5 Besides PET Uniqueness Index, the following parameters for each PET may 

also be calculated and help to rank the PETs: 

a) PET Solubility Index: which involves calculating LogP and LogD of 
the PET. 

b) PET Hydrophobicity & water accessibility: only hydrophilic peptides 
20 and peptides with good water accessibility will be selected. 

c) PET Length: since longer peptides tend to have conformations in 
solution, we use PET peptides with defined length of 8 amino acids. PET-specific 
antibodies will have better defined specificity due to limited number of epitopes in a 
shorter peptide sequences. This is very important for multiplexing assays using these 

25 antibodies. In one embodiment, only antibodies generated by this way will be used 
for multiplexing assays. 

d) Evolutionary Conservation Index: each human PET will be compared 
with other species to see whether a PET sequence is conserved cross species. 
Ideally, PET with minimal conservation, for example, between mouse and human 
sequences will be selected. This will maximize the possibility to generate good 
immunoresponse and monoclonal antibodies in mouse. 


30 
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VI. A pplications of the Invention 

A. Investigative and Diagnostic Applications 

The capture agents of the present invention provide a powerful tool in 
5 probing living systems and in diagnostic applications (e.g., clinical, environmental 
and industrial, and food safety diagnostic applications). For clinical diagnostic 
applications, the capture agents are designed such that they bind to one or more PET 
corresponding to one or more diagnostic targets (e.g., a disease related protein, 
collection of proteins, or pattern of proteins). Specific individual disease related 
10 proteins include, for example, prostate-specific antigen (PSA), prostatic acid 
phosphatase (PAP) or prostate specific membrane antigen (PSMA) (for diagnosing 
prostate cancer); Cyclin E for diagnosing breast cancer; Annexin, e.g., Annexin V 
(for diagnosing cell death in, for example, cancer, ischemia, or transplant rejection); 
or p-amyloid plaques (for diagnosing Alzheimer's Disease). 

1 5 Thus, PETs and the capture agents of the present invention may be used as a 

source of surrogate markers. For example, they can be used as markers of disorders 
or disease states, as markers for precursors of disease states, as markers for 
predisposition of disease states, as markers of drug activity, or as markers of the 
pharmacogenomic profile of protein expression. 

20 As used herein, a "surrogate marker" is an objective biochemical marker 

which correlates with the absence or presence of a disease or disorder, or with the 
progression of a disease or disorder (e.g., with the presence or absence of a tumor). 
The presence or quantity of such markers is independent of the causation of the 
disease. Therefore, these markers may serve to indicate whether a particular course 

25 of treatment is effective in lessening a disease state or disorder. Surrogate markers 
are of particular use when the presence or extent of a disease state or disorder is 
difficult to assess through standard methodologies (e.g., early stage tumors), or when 
an assessment of disease progression is desired before a potentially dangerous 
clinical endpoint is reached (e.g., an assessment of cardiovascular disease may be 

30 made using a PET corresponding to a protein associated with a cardiovascular 
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disease as a surrogate marker, and an analysis of HIV infection may be made using a 
PET corresponding to an HIV protein as a surrogate marker, well in advance of the 
undesirable clinical outcomes of myocardial infarction or fully-developed AIDS). 
Examples of the use of surrogate markers in the art include: Koomen et al. (2000) J. 
5 Mass, Spectrom. 35:258-264; and James (1 994) AIDS Treatment News Archive 209. 

Perhaps the most significant use of the invention is that it enables practice of 
a powerful new protein expression analysis technique: analyses of samples for the 
presence of specific combinations of proteins and specific levels of expression of 
combinations of proteins. This is valuable in molecular biology investigations 

10 generally, and particularly in development of novel assays. Thus, this invention 
permits one to identify proteins, groups of proteins, and protein expression patterns 
present in a sample which are characteristic of some disease, physiologic state, or 
species identity. Such multiparametric assay protocols may be particularly 
informative if the proteins being detected are from disconnected or remotely 

15 connected pathways. For example, the invention might be used to compare protein 
expression patterns in tissue, urine, or blood from normal patients and cancer 
patients, and to discover that in the presence of a particular type of cancer a first 
group of proteins are expressed at a higher level than normal and another group are 
expressed at a lower level. As another example, the protein chips might be used to 

20 survey protein expression levels in various strains of bacteria, to discover patterns of 
expression which characterize different strains, and to determine which strains are 
susceptible to which antibiotic. Furthermore, the invention enables production of 
specialty assay devices comprising arrays or other arrangements of capture agents 
for detecting specific patterns of specific proteins. Thus, to continue the example, in 

25 accordance with the practice of the invention, one can produce a chip which can be 
exposed to a cell lysate preparation from a patient or a body fluid to reveal the 
presence or absence or pattern of expression informative that the patient is cancer 
free, or is suffering from a particular cancer type. Alternatively, one might produce a 
protein chip that would be exposed to a sample and read to indicate the species of 

30 bacteria in an infection and the antibiotic that will destroy it. 

A junction PET is a peptide which spans the region of a protein 
corresponding to a splice site of the RNA which encodes it. Capture agents designed 
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to bind to a junction PET may be included in such analyses to detect splice variants 
as well as gene fusions generated by chromosomal rearrangements, e.g., cancer- 
associated chromosomal rearrangements. Detection of such rearrangements may 
lead to a diagnosis of a disease, e.g., cancer. It is now becoming apparent that splice 

5 variants are common and that mechanisms for controlling RNA splicing have 
evolved as a control mechanism for various physiological processes. The invention 
permits detection of expression of proteins encoded by such species, and correlation 
of the presence of such proteins with disease or abnormality. Examples of cancer- 
associated chromosomal rearrangements include: translocation t(16;21)(pl l;q22) 

10 between genes FUS-ERG associated with myeloid leukemia and non-lymphocytic, 
acute leukemia (see Ichikawa H. et al. (1994) Cancer Res. 54(1 l):2865-8); 
translocation t(21;22)(q22;ql2) between genes ERG-EWS associated with Ewing's 
sarcoma and neuroepithelioma (see Kaneko Y. et al. (1997) Genes Chromosomes 
Cancer 18(3):228-31); translocation t(14;18)(q32;q21) involving the bcl2 gene and 

15 associated with follicular lymphoma; and translocations juxtaposing the coding 
regions of the PAX3 gene on chromosome 2 and the FKHR gene on chromosome 13 
associated with alveolar rhabdomyosarcoma (see Barr F.G. e* al (1996) Hum. Mol. 
Genet. 5:15-21). 

For applications in environmental and industrial diagnostics the capture 
20 agents are designed such that they bind to one or more PET corresponding to a 
biowarfare agent (e.g., anthrax, small pox, cholera toxin) and/or one or more PET 
corresponding to other environmental toxins (Staphylococcus aureus a-toxin, Shiga 
toxin, cytotoxic necrotizing factor type 1, Escherichia coli heat- stable toxin, and 
botulinum and tetanus neurotoxins) or allergens. The capture agents may also be 
25 designed to bind to one or more PET corresponding to an infectious agent such as a 
bacterium, a prion, a parasite, or a PET corresponding to a virus (e.g., human 
immunodeficiency virus-1 (HIV-1), HIV-2, simian immunodeficiency virus (SIV), 
hepatitis C virus (HCV), hepatitis B virus (HBV), Influenza, Foot and Mouth 
Disease virus, and Ebola virus). 
30 The following part illustrates the general idea of diagnostic use of the instant 

invention in one specific setting - serum biomarker assays. 


.112- 


WO 2005/078453 


PCT/US2005/003634 


The proteins found in human plasma perform many important functions in 
the body. Over or under expression of these proteins can thus cause disease directly, 
or reveal its presence. Studies have shown that complex serum proteomic patterns 
might reflect the underlying pathological state of an organ such as the ovary 

5 (Petricoin et al., Lancet 359: 572-577, 2002). Therefore, the easy accessibility of 
serum samples, and the fact that serum comprehensively samples the human 
phenotype - the state of the body at a particular point in time - make serum an 
attractive option for a broad array of applications, including clinical and diagnostics 
applications (early detection and diagnosis of disease, monitor disease progression, 

10 monitor therapy etc.), discovery applications (such as novel biomarker discovery), 
and drug development (drug efficacy and toxicity, and personalized medicine). In 
fact, over $1 billion annually is spent on immunoassays to measure proteins in 
plasma as indicators of disease (Plasma Proteome Institute (PPI), Washington, 
D.C.). 

1 5 Despite decades of research, only a handful of proteins (about 20) among the 

500 or so detected proteins in plasma are measured routinely for diagnostic 
purposes. These include: cardiac proteins (troponins, myoglobin, creatine kinase) as 
indicators of heart attack; insulin, for management of diabetes; liver enzymes 
(alanine or aspartate transaminases) as indicators of drug toxicity; and coagulation 

20 factors for management of clotting disorders. About 150 proteins in plasma are 
measured by some laboratory for diagnosis of less common diseases. 

IN addition, proteins in plasma differ in concentration by at least one billion- 
fold. For example, serum albumin has a normal concentration range of 35-50 mg/mL 
(35-50 x 10 9 pg/mL) and is measured clinically as an indication of severe liver 

25 disease or malnutrition, while interleukin 6 (IL-6) has a normal range of just 0-5 
pg/mL, and is measured as a sensitive indicator of inflammation or infection. 

Thus, there is a need for reference levels of all serum proteins, and reliable 
assays for measuring serum protein levels under any conditions. However, 
standardization of immunoassays for heterogeneous antigens is nearly impossible 

30 about 10 years ago (Ekins, Scand J Clin Lab Invest. 205: 33^16, 1991). One of the 
major obstacle is the apparent need of having identical standard and analyte. This is 
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the case with only a few small peptides. With larger peptides and proteins, the 
problems tend to become more complicated because biological samples often 
contain proforms, splice variants, fragments, and complexes of the analyte 
(Stenman, Clinical Chemistry 47: 815-820, 2001). One such problem is illustrated 
5 by measuring serum TGF-beta levels. 

The TGF-beta superfamily proteins are a collection of structurally related 
multi-function proteins that have a diverse array of biological functions including 
wound healing, development, oncogenesis, and atherosclerosis. There are at least 
three known mammalian TGF-beta proteins (betal, beta2 and beta3), which are 

10 thought to have similar functions, at least in vitro. Each of the three isoforms are 
produced as pre-pro-proteins, which rapidly dimerizes. After the loss of the signal 
sequences, sugar moieties are added to the proproteins regions known as the Latency 
Associated Peptide, or LAP. In addition, there is proteolytic cleavage between the 
LAPs and the mature dimers (the functional portion), but the cleaved LAPs still 

15 associate with the mature dimer, forming a complex known as the small latent 
complex. Either prior to secretion, or in the extracellular milieu, the small latent 
complex can bind to a large number of other proteins forming a large number of 
higher molecular weight latent complexes. The best characterized of these proteins 
are the latent TGF-beta binding protein family LTBP1-4 and fibrillin-1 and -2 (see 

20 Figure 9). Once in the extracellular environment, the TGF-beta complex may bind 
even more proteins to form other complexes. Known soluble TGF-beta binding 
proteins include: decorin, alpha-fetoprotein (AFP), betaglycan extracellular domain, 
6-amyloid precursor, and fetuin. Given the various isoforms, complexes, processing 
stages, etc., it is very difficult to accurately measure serum TGF-beta protein levels, 

25 and a range of 100-fold differences in serum level of TBG-betal are reported by 
different groups (see Grainger et al., Cytokine & Growth Factor Reviews 11: 133- 
145, 2000). 

The other problem arises from the false positive / negative effects of anti- 
animal antibodies on immunoassays. Specifically, in a sandwich-type assay for a 
30 specific antigen in a serum sample, instead of capturing the desired antigen, the 
immobilized capture antibody may bind to anti-animal antibodies in the serum 
sample, which in turn can be bound by the labeled secondary antibody and gives rise 
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to false positive result. On the other hand, too much anti -animal antibodies may 
block the interaction between the capture antibody and the desired antigen, and the 
interaction between the labeled secondary antibody and the desired antigen, leading 
to false negative result. This is a serious problem demonstrated in a recent study by 

5 Rotmensch and Cole (Lancet 355: 712-715, 2000), which shows that in all 12 cases 
where women were diagnosed of having postgestational choriocarcinoma on the 
basis of persistently positive human chorionic gonadotropin (hCG) test results in the 
absence of pregnancy, a false diagnosis had been made, and most of the women had 
been subjected to needless surgery or chemotherapy. Such diagnostic problems 

10 associated with anti-animal antibodies have also been reported elsewhere (Hennig et 
al., The influence of naturally occurring heterophilic anti -immunoglobulin 
antibodies on direct measurement of serum proteins using sandwich ELISAs. 
Journal of Immunological Methods 235: 71-80, 2000; Covinsky et al., An IgMl 
Antibody to Escherichia coli Produces False-Positive Results in Multiple 

1 5 Immunometric Assays. Clinical Chemistry 46: 1 1 57- 1 1 6 1 , 2000). 

All these problems can be efficiently solved by the methods of the instant 
invention. By digesting serum samples and converting all forms of the target protein 
to a uniform PET-containing peptide, the methods of the instant invention greatly 
reduce the complexity of the sample. Anti-animal antibodies, proteins complexes, 

20 various isoforms are no longer expected to be a significant factor in the digested 
serum sample, thus facilitating more reliable, reproducible, and accurate results from 
assay to assay. 

The method of the instant invention is by no means limited to one particular 
serum protein such as TGF-beta. It has broad applications in a wide range of serum 

25 proteins, including peptide hormones, candidate disease biomarkers (such as PSA, 
CA125, MMPs, etc.), serum disease and non-disease biomarkers, and acute phase 
response proteins. For example, measuring the following types of serum biomarkers 
will have broad applications in clinical and diagnostic uses: 1) disease state markers 
(such as markers for inflammation, infection, etc.), and 2) non-disease state markers, 

30 including markers indicating drug and hormone effects (e.g., alcohol, androgens, 
anti-epileptics, estrogen, pregnancy, hormone replacement therapy, etc.). Exemplary 
serum proteins that can be measured include: ApoA-I, Andogens, AAT, AAG, 


-115- 


WO 2005/078453 


PCT/US2005/003634 


A2M, Alb, Apo-B, AT III, C3, Cp, C4, CRP, SAA, Hp, AGP, Fb, AP, FIB, FER, 
PAL, PSM, Tf, IgA, IgG, IgM, IgE, FN, B2M, and RBP. 

One preferred assay method for these serum proteins is the sandwich assay 
using a PET-specific capture agent and at least one labeled secondary capture 
5 agent(s) for detection of binding. These assays may be performed in an array format 
according to the teaching of the instant application, in that different capture agents 
(such as PET-specific antibodies) can be arrayed on a single (or a few) microarrays 
for use in simultaneous detection / quantitation of a large number of serum 
biomarkers. 

10 Foundation for Blood Research (FBR, Scarborough, ME) has developed a 

152-page guide on serum protein utility and interpretation for day to day use by 
practitioners and laboratorians. This guide contains a distillation of the world's 
literature on the subject, is fully indexed, and is presented by a given disease state 
(Section I), as well as by individual proteins (Section II). This book is generally 

15 useful for interpretation of test results, as well as providing guidance regarding 
which test is (or is not) appropriate to order and why (or why not). Section II, which 
covers general information on serum proteins, is also helpful regarding background 
information about each protein. The entire content of which is incorporated herein 
by reference. 

20 

B. High-Throughput Screening 

Compositions containing the capture agents of the invention, e.g., 
microarrays, beads or chips enable the high-throughput screening of very large 
numbers of compounds to identify those compounds capable of interacting with a 
25 particular capture agent, or to detect molecules which compete for binding with the 
PETs. Microarrays are useful for screening large libraries of natural or synthetic 
compounds to identify competitors of natural or non-natural ligands for the capture 
agent, which may be of diagnostic, prognostic, therapeutic or scientific interest. 

The use of microarray technology with the capture agents of the present 
30 invention enables comprehensive profiling of large numbers of proteins from normal 
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and diseased-state serum, cells, and tissues. 

For example, once the microarray has been formed, it may be used for high- 
throughput drug discovery (eg., screening libraries of compounds for their ability to 
bind to or modulate the activity of a target protein); for high-throughput target 
5 identification (eg., correlating a protein with a disease process); for high-throughput 
target validation (eg., manipulating a protein by, for example, mutagenesis and 
monitoring the effects of the manipulation on the protein or on other proteins); or in 
basic research (eg., to study patterns of protein expression at, for example, key 
developmental or cell cycle time points or to study patterns of protein expression in 
1 0 response to various stimuli). 

In one embodiment, the invention provides a method for identifying a test 
compound, eg., a small molecule, that modulates the activity of a ligand of interest. 
According to this embodiment, a capture agent is exposed to a ligand and a test 
compound. The presence or the absence of binding between the capture agent and 
15 the ligand is then detected to determine the modulatory effect of the test compound 
on the ligand. In a preferred embodiment, a microarray of capture agents, that bind 
to ligands acting in the same cellular pathway, are used to profile the regulatory 
effect of a test compound on all these proteins in a parallel fashion. 

20 C Pharmacoproteomics 

The capture agents or arrays comprising the capture agents of the present 
invention may also be used to study the relationship between a subjects protein 
expression profile and that subject's response to a foreign compound or drug. 
Differences in metabolism of therapeutics can lead to severe toxicity or therapeutic 
25 failure by altering the relation between dose and blood concentration of the 
pharmacologically active drug. Thus, use of the capture agents in the foregoing 
manner may aid a physician or clinician in determining whether to administer a 
pharmacologically active drug to a subject, as well as in tailoring the dosage and/or 
therapeutic regimen of treatment with the drug. 

30 
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D. Protein Profiling 

As indicated above, capture agents of the present invention enable the 
characterization of any biological state via protein profiling. The term "protein 
profile," as used herein, includes the pattern of protein expression obtained for a 
5 given tissue or cell under a given set of conditions. Such conditions may include, but 
are not limited to, cellular growth, apoptosis, proliferation, differentiation, 
transformation, tumorigenesis, metastasis, and carcinogen exposure. 

The capture agents of the present invention may also be used to compare the 
protein expression patterns of two cells or different populations of cells. Methods of 

10 comparing the protein expression of two cells or populations of cells are particularly 
useful for the understanding of biological processes. For example, using these 
methods, the protein expression patterns of identical cells or closely related cells 
exposed to different conditions can be compared. Most typically, the protein content 
of one cell or population of cells is compared to the protein content of a control cell 

15 or population of cells. As indicated above, one of the cells or populations of cells 
may be neoplastic and the other cell is not. In another embodiment, one of the two 
cells or populations of cells being assayed may be infected with a pathogen. 
Alternatively, one of the two cells or populations of cells has been exposed to a 
chemical, environmental, or thermal stress and the other cell or population of cells 

20 serves as a control. In a further embodiment, one of the cells or populations of cells 
may be exposed to a drug or a potential drug and its protein expression pattern 
compared to a control cell. 

Such methods of assaying differential protein expression are useful in the 
identification and validation of new potential drug targets as well as for drug 

25 screening. For instance, the capture agents and the methods of the invention may be 
used to identify a protein which is overexpressed in tumor cells, but not in normal 
cells. This protein may be a target for drug intervention. Inhibitors to the action of 
the overexpressed protein can then be developed. Alternatively, antisense strategies 
to inhibit the overexpression may be developed. In another instance, the protein 

30 expression pattern of a cell, or population of cells, which has been exposed to a drug 
or potential drug can be compared to that of a cell, or population of cells, which has 
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not been exposed to the drug. This comparison will provide insight as to whether the 
drug has had the desired effect on a target protein (drug efficacy) and whether other 
proteins of the cell, or population of cells, have also been affected (drug specificity). 


5 E. Protein Sequencing, Purification and Characterization 

The capture agents of the present invention may also be used in protein 
sequencing. Briefly, capture agents are raised that interact with a known 
combination of unique recognition sequences. Subsequently, a protein of interest is 
fragmented using the methods described herein to generate a collection of peptides 

10 and then the sample is allowed to interact with the capture agents. Based on the 
interaction pattern between the collection of peptides and the capture agents, the 
amino acid sequence of the collection of peptides may be deciphered. In a preferred 
embodiment, the capture agents are immobilized on an array in pre-determined 
positions that allow for easy determination of peptide-capture agent interactions. 

15 These sequencing methods would further allow the identification of amino acid 
polymorphisms, e.g., single amino acid polymorphisms, or mutations in a protein of 
interest. 

In another embodiment, the capture agents of the present invention may also 
be used in protein purification. In this embodiment, the PET acts as a ligand/affinity 

20 tag and allows for affinity purification of a protein. A capture agent raised against a 
PET exposed on a surface of a protein may be coupled to a column of interest using 
art known techniques. The choice of a column will depend on the amino acid 
sequence of the capture agent and which end will be linked to the matrix. For 
example, if the amino-terminal end of the capture agent is to be linked to the matrix, 

25 matrices such as the Affigel (by Biorad) may be used. If a linkage via a cysteine 
residue is desired, an Epoxy-Sepharose-6B column (by Pharmacia) may be used. A 
sample containing the protein of interest may then be run through the column and 
the protein of interest may be eluted using art known techniques as described in, for 
example, J. Nilsson et al. (1997) "Affinity fusion strategies for detection, 

30 purification, and immobilization of recombinant proteins," Protein Expression and 
Purification, 11:11-16, the contents of which are incorporated by reference. This 
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embodiment of the invention also allows for the characterization of protein-protein 
interactions under native conditions, without the need to introduce artificial affinity 
tags in the protein(s) to be studied. 

In yet another embodiment, the capture agents of the present invention may 

5 be used in protein characterization. Capture agents can be generated that 
differentiate between alternative forms of the same gene product, e.g., between 
proteins having different post-translational modifications (e.g., phosphorylated 
versus non-phosphorylated versions of the same protein or glycosylated versus non- 
glycosylated versions of the same protein) or between alternatively spliced gene 

10 products. 

The utility of the invention is not limited to diagnosis. The system and 
methods described herein may also be useful for screening, making prognosis of 
disease outcomes, and providing treatment modality suggestion based on the 
profiling of the pathologic cells, prognosis of the outcome of a normal lesion and 
1 5 susceptibility of lesions to malignant transformation. 


F. Detection of Post-translational Modifications 

The subject computer generated PETs can also be analyzed according to the 
likely presence or absence of post-translational modifications. More than 100 

20 different such modifications of amino acid residues are known, examples include but 
are not limited to acetylation, amidation, deamidation, prenylation (such as 
famesylation or geranylation), formylation, glycosylation, hydroxylation, 
methylation, myristoylation, phosphorylation, ubiquitination, ribosylation and 
sulphation. Sequence analysis softwares which are capable of determining putative 

25 post-translational modification in a given amino acid sequence include the NetPhos 
server which produces neural network predictions for serine, threonine and tyrosine 
phosphorylation sites in eukaryotic proteins (available through 
http://www.cbs.dtu.dk/servicesmet-PhosO, GPI Modification Site Prediction 
(available through http://mendel.imp.univie.ac.at/gpi) and the ExPASy proteomics 

30 server for total protein analysis (available through www.expasy.ch/tools/) 
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In certain embodiments, preferred PET moieties are those lacking any post- 
radiational modification sites, since post-translationally modified amino acid 
sequences may complicate sample preparation and/or interaction with a capture 
agent. Notwithstanding the above, capture agents that can discriminate between 
5 post-translationally forms of a PET, which may indicate a biological activity of the 
polypeptide-of-interest, can be generated and used in the present invention. A very 
common example is the phosphorylation of OH group of the amino acid side chain 
of a serine, a threonine, or a tyrosine group in a polypeptide. Depending on the 
polypeptide, this modification can increase or decrease its functional activity. In one 
10 embodiment, the subject invention provides an array of capture agents that are 
variegated so as to provide discriminatory binding and identification of various post- 
translationally modified forms of one or more proteins. In a preferred alternative 
embodiment, the subject invention provides an array of capture agents that are 
variegated so as to provide specific binding to one or more PET uniquely associated 
15 with a modification of interest, which modification itself can be readily detected 
and/or quantitated by additional agents, such as a labeled secondary antibody 
specifically recognizing the modification (e.g., a phospho-tyrosine antibody). 

In a general sense, the invention provides a general means to detect / 
quantitate protein modifications. "Modification" here refers generally to any kind of 
20 non-wildtype changes in amino acid sequence, including post-translational 
modification, alternative splicing, polymorphysm, insertion, deletion, point 
mutation, etc. To detect / quantitate a specific modification within a potential target 
protein present in a sample, the sequence of the target protein is first analyzed to 
identify potential modification sites (such as phosphorylation sites for a specific 
25 kinase). Next, a potential fragment of the target protein containing such modification 
site is identified. The fragment is specific for a selected method of treatment, such as 
tryptic digestion or digestion by another protease or reliable chemical fragmentation. 
PET within (and unique) to the modification site-containing fragment can then be 
identified using the method of the instant invention. Fragmentation using a 
30 combination of two or more methods is also contemplated. Absolute predictability 
of the fragment size is desired, but not necessary, as long as the fragment always 
contains the desired PET and the modification site. 
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Antibody or other capture agents specific for the identified PET is obtained. 
The capture agent is then used in a sandwich ELISA format to detect captured 
fragments containing the modification (see Figure 22). The site of the PET is 
proximal to the post-translational modification site(s). Thus a binding to the PET by 
5 a capture agent will not interfere with the binding of a detection agent specific for 
the modified residue. 

A few specific embodiments of this aspect of the invention are described in 
more detail below (see Figure 23). For illustrative purpose only, the capture agents 
described below in various embodiments of the invention are antibodies specific for 
1 0 PETs. However, it should be understood that any capture agents described above can 
be used in each of the following embodiments. 


(i) Phosphorylation 

The reversible addition of phosphate groups to proteins is important for the 
15 transmission of signals within eukaryotic cells and, as a result, protein 
phosphorylation and dephosphorylation regulate many diverse cellular processes. To 
detect the presence and/or quantitate the amount of a phosphorylated peptide in a 
sample, anti-phospho-amino acid antibodies can be used to detect the presence of 
phosphopeptides. 

20 There are numerous commercially available phospho-tyrosine specific 

antibodies that can be adapted to be used in the instant invention. Merely to 
illustrate, phosphotyrosine antibody (ab2287) [13F9] of Abeam Ltd (Cambridge, 
UK) is a mouse IgGl isotype monoclonal antibody reacts specifically with 
phosphotyrosine and shows minimal reactivity by ELISA and competitive ELISA 

25 with phosphoserine or phosphothreonine. The antibody reacts with free 
phosphotyrosine, phosphotyrosine conjugated to carriers such as thyroglobulin or 
BSA, and detects the presence of phosphotyrosine in proteins of both unstimulated 
and stimulated cell lysates. 

Similarly, RESEARCH DIAGNOSTICS INC (Flanders, NJ) provides a few 

30 similar anti-phosphotyrosine antibodies. Among them, RDI-P HOSTYRabmb is a 
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mouse mIgG2b isotype monoclonal antibody reacts strongly and specifically with 
phosphotyrosine-containing proteins and can be blocked specifically with 
phosphotyrosine. No reaction with either phosphothreonine or phosphoserine is 
detected. This antibody appears to have broad cross-species reactivity, and is 
5 reactive with various tyrosine-phosphorylated proteins of human, chick, frog, rat, 
mouse and dog origin. 

RESEARCH DIAGNOSTICS INC also provides phosphoserine-specific 
antibodies, such as RDI-PHOSSERabr, which is an affinity-purified rabbit antibody 
made against phosphoserine containing proteins. The antibody reacts specifically 

10 with serine phosphorylated proteins and shows no significant cross reactivity to 
other phosphothreonine or phosphotyrosine by western blot analysis. This antibody 
is suitable for ELISA according to the manufacture's suggestion. The company also 
provides a mouse IgGl monoclonal anti-phosphoserine antibody RDI-PHOSSEabm, 
which reacts specifically with phosphorylated serine, both as free amino acid or 

15 conjugated to carriers as BSA or KLH. No cross reactivity is observed with non- 
phosphorylated serine, phosphothreonine, phosphotyrosine, AmpMP or ATP. 

RDI-PHOSTHRabr is an affinity isolated rabbit anti-phosphothreonine 
antibody (anti-pT) provided by RESEARCH DIAGNOSTICS INC. Both antigen- 
capture and antibody-capture ELISA indicated that the anti-phosphothreonine 

20 antibodies can recognize threonine-phosphorylated protein, phosphothreonine and 
lysine-phosphothreonine-glycine random polymer, respectively. Direct, competitive 
antigen-capture ELISA demonstrated that the antibodies are specifically inhibited by 
free phosphothreonine, phosvitin but not by free phosphoserine, phosphotyrosine, 
threonine and ATP. The company also provides a mouse IgG2b monoclonal anti- 

25 phosphothreonine antibody RDI-PHOSTHabm, which reacts specifically with 
phosphorylated threonine, both as free amino acid or conjugated to carriers as BSA 
or KLH. No cross reactivity is observed with non-phosphorylated threonine, 
phophoserine, phosphotyrosine, AmpMP or ATP. 

Molecular Probe (Eugene, OR) has developed a small molecule fluorophore 
30 phosphosensor, referred to as Pro-Q Diamond dye, which is capable of ultrasensitive 
global detection and quantitation of phosphorylated amino acid residues in peptides 
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and proteins displayed on microarrays. The utility of the fluorescent Pro-Q Diamond 
phosphosensor dye technology is demonstrated using phosphoproteins and 
phosphopeptides as well as with protein kinase reactions performed in miniaturized 
microarray assay format (Martin, et al, Proteomics 3: 1244-1255, 2003). Instead of 
5 applying a phosphoamino acid-selective antibody labeled with a fluorescent or 
enzymatic tag for detection, a small, fluorescent probe is employed as a universal 
sensor of phosphorylation status. The detection limit for phosphoproteins on a 
variety of different commercially available protein array substrates was found to be 
312-625 fg, depending upon the number of phosphate residues. Characterization of 
10 the enzymatic phosphorylation of immobilized peptide targets with Pro-Q Diamond 
dye readily permits differentiation between specific and non-specific peptide 
labeling at picogram to subpicogram levels of detection sensitivity. Martin et al. 
(supra) also describe in detail the suitable protocols, instruments for using the Pro-Q 
stain, especially for peptides on microarrays, the entire contents of which are 
15 incorporated herein by reference. 

One of the advantageous of the method over other methods, such as 
identification of modified amino acids in proteins by mass spectrometry, is that the 
instant invention provides a much simpler technique that does not rely on expensive 
instruments, and thus can be easily adapted to be used in small or large laboratories, 
20 in industry or academic settings alike. 

In one embodiment, the instant invention can be used to identify potential 
substrates of a specific kinase or kinase subfamily. As the number of known protein 
kinases has increased at an ever-accelerating pace, it has become more challenging 
to determine which protein kinases interact with which substrates in the cell. 
25 The determination of consensus phosphorylation site motifs by amino acid 

sequence alignment of known substrates has proven useful in this pursuit. These 
motifs can be helpful for predicting phosphorylation sites for specific protein kinases 
within a potential protein substrate. The table below summarizes merely some of the 
known data about specificity motifs for various well-studied protein kinases, along 
30 with examples of known phosphorylation sites in specific proteins (for a more 
extensive list, see Pearson, R. B., and Kemp, B. E. (1991). In T. Hunter and B. M. 
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Sefton (Eds.), Methods in Enzymology Vol. 200, pp. 62-81. San Diego: Academic 
Press, incorporated by reference). Phosphoacceptor residue is indicated in bold, 
amino acids which can function interchangeably at a particular residue are separated 
by a slash (/), and residues which do not appear to contribute strongly to recognition 

5 are indicated by an "X" Some protein kinases such as CK1 and GSK-3 contain 
phosphoamino acid residues in their recognition motifs, and have been termed 
"hierarchical" protein kinases <see Roach, J, Biol Chem. 266, 14139-14142, 1991 
for review). They often require prior phosphorylation by another kinase at a residue 
in the vicinity of their own phosphorylation site. S(p) represents such preexisting 

10 phosphoserine residues. 


Protein Kinase 

Recognition 
Motifs 3 

Phosphorylation 
Sitesb 

Protein Substrate 
(reference) 

cAMP-dependent 
Protein Kinase 
(PKA, cAPK) 

r-x-s/t 

R-R/K-X-S/T 

\r j T>r> A CT \r\J T 

FjRRLSIST 
A 29 GARRKASGPP 

pyruvate kinase (2) 
phosphorylase kinase, 
a chain (2) 

histone HI, bovine (2) 

Casein Kinase I 
(CKI.CK-1) ' 

S(P)-X-X-S/T 

P XT C/"PYV/QQT PflT 

D 43 IGS(p)ES(p)TEDQ 

glycogen synthase, 
rabbit muscle (4) 
a s i -casein (4) 

Casein Kinase II 
(CKJI, CK-2) 

S/T-X-X-E 

A72DSESEDEED 

L37ESEEEGVPST 

E 26 DNSEDEISNL 

PKA regulatory 
subunit, Rji (2) 
p34 cdc2 , human (5) 
acetyl-CoA 
carboxylase (2) 

Glycogen Synthase 
Kinase 3 (GSK-3) 

S-X-X-X-S(p) 

S64|VPPSPSLS(p) 
S64iVPPS0?)PSLS(p) 

glycogen synthase, 
human (site 3b) (6,2) 
glycogen synthase, 
human (site 3a) (6,2) 

Cdc2 Protein 
Kinase 

S/T-P-X-R/K c 

P 13 AKTPVK 
H122STPPKKKRK 

histone HI, calf 

thymus (2) 

large T antigen (2) 

Calmodulin- 
dependent Protein 
Kinase II (CaMK 
II) 

R-X-X-S/T 
R-X-X-S/T-V 

N 2 YLRRRLSDSN 
K 1 9 1 M AR VFS VLR 

synapsin (site 1) (2) 
calcineurin (2) 

Mitogen-activated 

P-X-S/T-P d 

P244LSP 

c-Jun (7) 
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Protein Kinase 
(Extracellular 
Signal-regulated 
Kinase) (MAPK, 
Erk) _ 

X-X-S/T-P 

P92SSP 
V 420 LSP 

cyclin B (7) 
Elk-1 (7) 

cGMP-dependent 
Protein Kinase 
(cGPK) 

R/K-X-S/T \ 
R/K- X -X-S/T 

G 26 KKRKRSRKES 
FiRRLSIST 

histone H2B (2) 
phosphorylase kinase 
(a chain) (2) 

Phosphorylase 
Kinase (PhK) 

K/R-X-X-S-V/I 

D 6 QEKRKQISVRG 
P,LSRTLSVSS 

phosphorylase (2) 
glycogen synthase 
(site 2) (2) 

Protein Kinase C 
(PKC) 

C/T Y lf/P 
Of 1 -A-fw/lx 

K/R- X -X-S/T 
K/R-X-S/T 

H< n .,FGTHSTKR 

PiLSRTLSVSS 

Q4KRPSQRSKYL 

fibrinogen (2) 
glycogen synthase 
(site 2) (2) 
myelin basic protein 

(2) 

Abl Tyrosine 
Kinase 

I/V/L-Y-X-X-P/F 6 



Epidermal Growth 
Factor Receptor 
Kinase (EGF-RK) 

E/D-Y-X 
E/D-Y-I/L/V 

R n68 ENAEYLRVAP 
A767EPDYGALYE 

autophosphorylation 
(2) 

phospholipase C-g(2) 


Single-letter Amino Acid Code: 

A = alanine, C = cysteine, D = aspartic acid, E = glutamic acid, F = phenylalanine, G 
= glycine, H = histidine, I ■ isoleucine, K = lysine, L = leucine, M = methionine, N - 
asparagine, P = proline, Q = glutamine, R = arginine, S = serine, T = threonine, W - 
tryptophan, V = valine, Y = tyrosine, X = any amino acid 


a Recognition motifs are taken from Pearson and Kemp (supra) except where noted. 
Consult Pearson and Kemp for a comprehensive list of phosphorylation site 
sequences and specificity motifs. 

b Subscripted numbers refer to the position of the first residue within the given 
polypeptide chain. 

e From(l). 

d From (7). 

e From (8). See refs (8) and (9) for discussion of substrate recognition by Abl. 
References used in the table above: 

1. Kennelly, P. J., and Krebs, E. G. (1991) J. Biol. Chem. 266, 15555-15558. 
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2. Pearson, R. B., and Kemp, B. E. (1991). In T. Hunter and B. M. Sefton 

(Eds.), Methods in Enzymology Vol. 200, (pp. 62-81). San Diego: 
Academic Press. 

3. Roach, P. J. (1991) J. Biol. Chern. 266, 14139-14142. 

5 4. Flotow, H. et'al. (1990) J. Biol. Chem. 265, 14264-14269. 

5. Russo, G. L. et al. (1992) J. Biol. Chem. 267, 20317-20325. 

6. Fiol, C. J. et al. (1990) J. Biol. Chem. 265, 6061-6065. 

7. Davis, R. J. (1993) J. Biol. Chem. 268, 14553-14556. 

8. Songyang, Z. et al. (1995) Nature 373, 536-539. 

10 9. Geahlen, R. L. and Harrison, M. L. (1990). In B. E. Kemp (Ed.), Peptides 
and Protein Phosphorylation, (pp. 239-253). Boca Raton: CRC Press. 

However, since the determinants of protein kinase specificity involve 
complex 3-dimensional interactions, these motifs, short amino-acid sequences 
describing the primary structure around the phosphoacceptor residue, are a 

15 significant oversimplification of the issue. They do not take into account possible 
secondary and tertiary structural elements, or determinants from other polypeptide 
chains or from distant locations within the same chain. Furthermore, not all of the 
residues described in a particular specificity motif may carry the same weight in 
determining recognition and phosphorylation by the kinase. In addition, the potential 

20 recognition sequence maybe buried deep inside a tertiary structure of within a 
protein complex under physiological conditions and thus may never be accessible in 
vivo. As a consequence, they should be used with some caution. The instant 
invention provides a fast and convenient way to determine, on a proteome-wide 
basis, the identity of all potential kinase substrates that actually do become 

25 phopshorylated by the kinase of interest in vivo (or in vitro). 

Specifically, consensus recognition sequences of a kinase (or a kinase 
subfamily sharing substrate specificity) can be identified based on, for example, 
Pearson and Kemp or other kinase substrate motif database. For example, AKT (or 
PKB) kinase has a consensus phosphorylation site sequence of RXRXXS/T. All 
30 proteins in an organism (e.g., human) that contains this potential recognition 


-127- 


WO 2005/078453 


PCT/US2005/003634 


sequence can be readily identified through routine sequence searches. Using the 
method of the instant invention, peptide fragments of these potential substrates, after 
a pre-determined treatment (such as trypsin digestion), which contain both the 
recognition motif and at least one PET can then be generated. Antibodies (or other 

5 capture agents) against each of these identified PETs can be raised and printed on an 
array to generate a so-called "kinase chip," in this case, an AKT chip. Using this 
chip, any sample to be studied can be treated as described above and then be 
incubated with the chip so that all potential recognition site-containing fragments are 
captured. The presence or absence of phosphorylation on any given "spot" - a 

10 specific potential substrate - can be detected / quantitated by, for example, labeled 
secondary antibodies (see Figure 10). Thus, the identity of all AKT substrates in this 
organism under this condition may be identified in one experiment. The array can be 
reused for other samples by eluting the bound peptides on the array. Different arrays 
can be used in combination, preferably in the same experiment, to determine the 

1 5 substrates for multiple kinases. 


The reversible phosphorylation of tyrosine residues is an important 
mechanism for modulating biological processes such as cellular signaling, 
differentiation, and growth, and if deregulated, can result in various types of cancer. 

20 Therefore, an understanding of these dynamic cellular processes at the molecular 
level requires the ability to assess changes in the sites of tyrosine phosphorylation 
across numerous proteins simultaneously as well as over time. Thus in another 
embodiment, the instant invention provides a method to identify the various signal 
transduction pathways activated after a specific treatment to a sample, such as before 

25 and after a specific growth factor or cytokine treatment to a sample cell. The same 
method can also be used to compare the status of signal transduction pathways in a 
diseased sample from a patient and a normal sample from the same patient. 

Know ledges about the various signal transduction pathways existing in 
various organisms are accumulating at an astonishing pace. Science magazine's 

30 STKE (Signal Transduction Knowledge Environment) maintains a comprehensive 
and expanding list of known signal transduction pathways, their important 
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components, relationship between the components (inhibit, stimulation, etc.), and 
cross-talk between key members of the different pathways. The "Connections Map" 
provides a dynamic graphical interface into a cellular signaling database, which 
currently covers at least the following broad pathways: immune pathways (IL-4, IL- 

5 13, Token-like receptor); seven-transmembrane receptor pathways (Adrenergic, 
PAC1 receptor, Dictyostelium discoideum cAMP Chemotaxis, Wnt/Ca 2+ /cyclic 
GMP, G Protein-Independent 7 Transmembrane Receptor); Circadian Rhythm 
pathway (murine and Drosophila); Insulin pathway; FAS pathway; TNF pathway; 
G-Protein Coupled Receptor pathways; Integrin pathways; Mitogen-Activated 

10 Protein Kinase Pathways (MAPK, JNK, p38); Estrogen Receptor Pathway; 
Phosphoinositide 3-Kinase Pathway; Transforming Growth Factor-p (TGF-P) 
Pathway; B Cell Antigen Receptor Pathway; Jak-STAT Pathway; STAT3 Pathway; 
T Cell Signal Transduction Pathway; Type 1 Interferon (a/p) Pathway; Jasmonate 
Biochemical Pathway; and Jasmonate Signaling Pathway. Many other well-known 

15 signal transduction pathways not yet included are described in detail in other 
scientific literatures which can be readily identified in PubMed or other common 
search tools. Activation of most, if not all of these signal transduction pathways are 
generally characterized by changes in phosphorylation levels of one or more 
members of each pathway. 

20 Thus in a general sense, the status of any given number of signaling 

pathways in a sample can be determined by taking a "snap shot" of the 
phosphorylation status of one or more key members of these selected pathways. For 
example, the Mitogen-activated protein (MAP)l kinase pathways are evolutionarily 
conserved in eukaryotic cells. The pathways are essential for physiological 

25 processes, such as embryonic development and immune response, and regulate cell 
survival, apoptosis, proliferation, differentiation, and migration. In mammals, three 
major classes of MAP kinases (MAPKs) have been identified, which differ in their 
substrate specificity and regulation. These subgroups compose the extracellular 
signal-regulated kinases (ERKs), the c-Jun N-terminal kinases (JNKs), and the 

30 p38/RK/CSBP kinases. ERKs are activated by a range of stimuli including growth 
factors, cell adhesion, tumor-promoting phorbol esters, and oncogenes, whereas JNK 
and p38 are preferentially activated by proinflammatory cytokines, and a variety of 
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environmental stresses such as UV and osmotic stress. For this reason, the latter are 
classified as stress-activated protein kinases. Activation of the MAPKs is achieved 
by dual phosphorylation on threonine and tyrosine residues within a Thr-Xaa-Tyr 
motif located in the kinase subdomain VIII. This phosphorylation is mediated by a 

5 dual specificity protein kinase, MAPK kinase (MAPKK), and MAPKK is in turn 
activated by phosphorylation mediated by a serine/threonine protein kinase, 
MAPKK kinase. In addition to these activating kinases, several types of protein 
phosphatases have been also shown to control MAPK pathways by 
dephosphorylating the MAPKs or their upstream kinases. These protein 

10 phosphatases include tyrosine-specific phosphatases, serine/threonine-specific 
phosphatases, and dual specificity phosphatases (DSPs). Therefore, the activities of 
MAPKs can be regulated by upstream activating kinases and protein phosphatases, 
and the activation status can be determined by the phosphorylation status of, for 
example, ERK1/2, JNK, and p38. 

15 Specifically, fragments of ERK1/2, JNK and p38 containing the signature 

phosphorylation sites and PETs can be identified using the methods of the instant 
invention. Capture agents specifically recognizing such phosphorylation site- 
associated PETs can then be raised and immobilized on an array / chip. A sample 
(treated or untreated, thus containing high or low levels of phosphorylation of these 

20 pathway markers) can be digested and incubated with the chip, so as to determine 
the presence / absence of activation, and degree, time course, duration of activation, 
etc. 

In the same principal, many other related or perceived unrelated pathways 
may be manufactured on the same chip, since each pathway may be represented by 
25 from just one, to possibly all of the known pathway components. This type of chip 
may provide a comprehensive view of the various pathways that may be activated 
after a drug treatment. Pathway specific chips may also be used in conjunction to 
further determine the status of individual components within a pathway of interest. 

Because of the important functions of the kinases in virtually all kinds of 
30 signal transduction pathways, it is not surprising to see that many drugs directly or 
indirectly affects phosphorylation status of carious kinase substrates. Thus this type 
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10 


15 


of array may also be used in drug target identification. Briefly, samples treated by 
different drug candidates may be incubated with the same kind of array to generate a 
series of activation profiles of certain chosen targets. These profiles may be 
compared, preferably automatically, to determine which drug candidate has the same 
or similar activation profile as that of the lead molecule. 

This type of experiment will also yield useful information concerning the 
selectivity of candidate drugs, since it can be easily determined whether a candidate 
drug or drug analog actually have differential effects on various pathways, and if so, 
whether the difference is significant. 

The same type of experiments can also be adapted to screen for drug 
candidates that lacks undesired side effects or toxicity. 

One aspect of this type of application relates to the selection of specific 
protease(s) for fragmentation. The following table presents data resulting from 
analysis of protease sensitivity of potential phosphorylation sites in the human 
"kinome" (all kinases). This table may aid the selection of proteases among the 
several most frequently used proteases. 


Enzymes 

Total Peptide 
Fragments 

Peptide Fragments with 

S/T/Y 

=<10 aa 

>10 aa 

Chymotrypsin 

34,094 

10930 (43%) 

14985 (57%) 

S.A. V-8 E specific Enzyme 

34,233 

6753 (32%) 

14917(68%) 

Post-Proline Cleaving 
Enzyme 

29,715 

7077 (37%) 

12224(63%) 

Trypsin 

54,260 

15,217 
(53%) 

13311 (47%) 


(ii) Glvcosvlation 

A wide variety of eukaryotic membrane-bound and secreted proteins are 
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glycosylated, that is they contain covalently-bound carbohydrate, and therefore are 
termed glycoproteins. In addition, certain intracellular eukaryotic proteins are also 
glycoproteins. Glycosylation of polypeptides in eukaryotes occurs principally in 
three ways (Parekh et al., Trends Biotechnol 7: 1 17, 1989). Glycosylation through a 

5 glycosidic bond to an asparagine side-chain is known as N-glycosylation. Such 
asparagine residues only occur in the amino acid triplet sequence of Asn-Xaa- 
Ser/Thr, where Xaa can be any amino acid. The carbohydrate portion of a 
glycoprotein is also known as a glycan. O-glycans are linked to serine or threonine 
side-chains, through O-glycosidic bonds. In human, 284,535 octamer tags contains 

10 this NX(S/T) sequence, and 228,256 octamer PETs contains the NX(S/T) sequence. 
The latter is about 2.6% of the total octamer peptide tags in human. The N- and O- 
linked glycosylation are two of the most complex post-translational modifications. 
The polypeptide may also be linked to a phosphatidylinositol lipid anchor through a 
carbohydrate "bridge", the whole assembly being known as the glycosyl- 

1 5 phosphatidylinositol (GPI) anchor. 

In recent years, the functional significance of the carbohydrate moieties has 
been increasingly appreciated (Rademacher et al., Ann. Rev. Biochem. 57: 785, 
1988). Carbohydrates covalently attached to polypeptide chains can confer many 
functions to the glycoprotein, for example resistance to proteolytic degradation, the 

20 transduction of information between cells, and intercellular adhesion through ligand- 
receptor interactions (Gesundheit et al., J. Biol Chem. 262: 5197, 1987; Ashwell & 
Harford, Ann. Rev. Biochem. 51: 531, 1982; Podskalny et al, J. Biol Chem. 261: 
14076, 1986; Dennis et al., Science 236: 582, 1987). As glycoforms are the product 
of a series of biochemical modifications, perturbations within a cell can have 

25 profound effects on their structure. With the increase in understanding of 
carbohydrate functions, the need for rapid, reliable and sensitive methods for 
carbohydrate detection and analysis has grown considerably. 

Lectins are proteins that interact specifically and reversibly with certain 
sugar residues. Their specificity enables binding to polysaccharides and 
30 glycoproteins (even agglutination of erythrocytes and tumor cells). The binding 
reaction between a lectin and a specific sugar residue is analogous to the interaction 
between an antibody and an antigen. Substances bound to lectin may be resolved 
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with a competitive binding substance or an ionic strength gradient. In addition, 
among other procedures, lectins can be labeled with biotin or digoxigenin, and 
subsequently detected by avidin-conjugated peroxidase or anti-digoxigenin 
antibodies coupled with alkaline phosphatase, respectively (Carlsson SR: Isolation 

5 and characterization of glycoproteins. In: Glycobiology. A Practical Approach. 
Fukuda M and Kobata A (eds). Oxford University Press, Oxford, ppl-25, 1993, 
incorporated herein by reference). 

For example, Concanavalin A (Con A) binds molecules that contain a-D- 
mannose, a-D-glucose and sterically related residues with available C-3, C-4, or C-5 

10 hydroxyl groups. Like Con A, lentil lectin binds a-D-mannose, a-D-glucose, and 
sterically related residues, but lentil lectin distinguishes less sharply between 
glucosyl and mannosyl residues and binds simple sugars with lower affinity. 
Agarose wheat germ lectin specifically binds to N-acetyl-3-glucosaminyl residues. 
Wheat germ lectin specifically binds to N-acetyl-p-D-glucosaminyl residues. 

15 Psathyrella velutina lectin (PVL) preferentially interacts with the N- 
acetylglucosamine beta l~>2Man group. All these lectins can be used to detect the 
presence of various kinds of glycosylated peptides fragments after these PET- 
associated glycosylated peptide fragments are captured from the sample by capture 
agents. 

20 The GlycoTrack Kit from Glyko, Inc. (a Prozyme company, San Leandro, 

CA) detect glycosylation by using a specific carbohydrate oxidation reaction prior to 
binding of a high amplification color generating reagent. Briefly, a sample, either in 
solution or already immobilized to a support, is oxidized with periodate. This 
generates aldehyde groups that can react spontaneously with certain hydrazides at 

25 room temperature in aqueous conditions. Use of biotin-hydrazide following 
periodate oxidation leads to the incorporation of biotin into the carbohydrate (9). 
The biotinylated compound is detected by reaction with a streptavidin-alkaline 
phosphatase conjugate. Subsequently visualization is achieved using a substrate that 
reacts with the alkaline phosphatase bound to glycoproteins on the membrane, 

30 forming a colored precipitate. 

Molecular Probes (Eugene, OR) offer a proprietary Pro-Q Emerald 300 
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fluorescent glycoprotein stain for detection of glycoproteins. The new Pro-Q 
Emerald 300 fluorescent glycoprotein stain reacts with periodate-oxidized . 
carbohydrate groups, creating a bright green-fluorescent signal on glycoproteins. 
Depending upon the nature and the degree of glycosylation, this stain may be 50- 

5 fold more sensitive than the standard periodic acid-Schiff base method using acidic 
fuchsin dye. According to the manufacture, detection using the Pro-Q Emerald 300 
glycoprotein stain is much easier than detection of glycoproteins using biotin 
hydrazide with streptavidin-horseradish peroxidase and ECL detection (Amersham 
Pharmacia Biotech). The stain can detect 50ng of a typical glycosylated protein. 

10 Since the captured glycosylated PET-containing peptide fragments are much smaller 
than a typocal peptide, as little as low nanogram to high picograms of captured 
peptides can be detected using this dye. 

Thus to detect the presence and quantitation of glycosylation in a sample, all 
proteins or a subpopulation thereof which contains the potential glycosylation site 

15 NXS/T may be identified, and peptide fragments resulting from a specific pre- 
determined treatment may be analyzed to identify associated PETs. Capture agents 
against these PETs can then be raised. In a method analogous to the phosphorylation 
detection as described above, glycosylation can be detected / quantitated using the 
various detection methods 

20 

(iii) Other Post-translational modifications 

Capture agents, such as antibodies specific for other post-translationally 
modified residues are also readily availble. 

There are at least 46 anti-ubiquitin commercial antibodies available from 14 
25 different vendors. For example, Cell Signaling Technology (Beverly, MA) offers 
mouse anti-Ubiquitin monoclonal antibody, clone P4D1 (IgGl isotype, Cat. No. 
3936), which is specific for all species of ubiquitin, polyubiquitin, and ubiquitinated 
peptides. 

Anti-acetylated amino acid antibodies have also been commerciallized. See 
30 anti-acetylated-histon H3 and H4 antibodies (Catalog # 06-599 and Catalog # 06- 
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598) from Upstate Biotechnology (Lake Placid, NY). In fact, Alpha Diagnostic 
International, Inc. (San Antonio, TX) offers custom synthesis of anti-acetylated 
amino acid antibodies. 

Arginine methylation, a protein modification discovered almost 30 years ago, 

5 has recently experienced a renewed interest as several new arginine 
methyltransferases have been identified and numerous proteins were found to be 
regulated by methylation on arginine residues. Mowen and David published detailed 
protocols on Science's STKE 

(www.stke.org/cgi/content/full/OC - sigtrans;2001/93/pll) that provide guidelines for 

10 the straightforward identification of arginine-methylated proteins, made possible by 
the availability of novel, commercially available reagents. Specifically, two anti- 
methylated arginine antibodies are described: mouse monoclonal antibody to 
methylarginine, clone 7E6 (IgGl) (Abeam, Cambr idge, UK) (Data sheet: 
www.abcamxom/public/ab_detail.cftn?intAbID=412, which reacts with mono- and 

15 asymmetric dimethylated arginine residues; and mouse monoclonal antibody to 
methylarginine, clone 21C7 (IgM) (Abeam) (Data sheet: 
www.abcamxom/public/ab_detail.cfm?intAbID=4 1 3), which reacts with 
asymmetric dimethylated arginine residues. Detailed protocols for in vitro and in 
vivo analysis of arginine methylation are provided. See Mowen et al., Cell 104: 731- 

20 741,2001. 

Even if there is no reported antibodies at present for certain specific 
modifications, it is well within the capability of a skilled artisan to raise antibodies 
against that specific type of modified residues. There is no compelling reason to 
25 believe that such antibodies cannot be obtained, especially in view of the prior 
success in raising antibodies against reletively small groups such as phosphorylated 
amio acids. The anti-post-translational modification antibody should be checked 
against the same antigen that is un-modified to verify that the reactivity is depending 
upon the presence of the post-translational modification. 

30 
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G. Immunohistochemistry (IHC) 

Immunohistochemical analysis of tumor tissues / biopsy has traditionally 
played an important role in diagnosis, monitoring, and prognosis analysis of cancer. 
IHC is typically performed on disease tissue sections using antibodies (monoclonal 

5 or polyclonal) to specific disease markers. However, two major problems have 
hampered this useful procedure, such that it is frequently difficult to get 
reproducible, quantitative data. One problem is associated with the poor quality of 
antibodies used in the assay. Many antibodies lack specificity to a target biomarker, 
and tend to cross-react with other proteins not associated with disease status, 

10 resulting in high background. The other complication is that antibody may have 
difficulties accessing unknown epitopes after tissue/cell fixation. 

For example, Press et al. {Cancer Res. 54(10): 2771-7, 1994) compared 
immunohistochemical staining results obtained with 7 polyclonal and 21 
monoclonal antibodies in sections from paraffin-embedded blocks of breast cancer 

15 samples. It was found that the ability of these antibodies to detect the HER2/neu 
antigen overexpression was extremely variable, providing an important explanation 
for the variable overexpression rate reported in the literature. 

The other problem is associated with sample processing before IHC. 
Generally, the efficiency of antigen retrieval is unpredictable in the concurrent 

20 protocol. It is also reported that heating coupled with enzyme digestion tends to give 
better results. But since epitopes for antibodies are not known, heating/digestion 
may cause different degree of problems for antibody recognition. 

Therefore, PET-derived antibodies represent a unique solution as 
standardized reagents for IHC. In certain preferred embodiments, PETs present on 

25 the surface of the target protein will be chosen for easy accessibility by the PET- 
specific antibodies. The chemistry of cell fixation may also be taken into account to 
select optimum amino acid sequences of PETs. For example, if certain residues are 
known to form cross-links after fixation, these residues will be selected against in 
PET selection. Similarly, epitopes that overlap with enzyme recognition sites will 

30 not be chosen. These measures will help to achieve consistent, reproducible results 
and high rate of success in IHC experiments. 
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VII. Use of Multiple PETs in Highly Accurate Funct ional Measurement of 
Proteins 

In certain embodiments of the invention, it may be advantageous to produce 
5 two or more PETs for each protein of interest. For example, trypsin digestion (or any 
other protease treatment or chemical fragmentation methods described above) may 
be incomplete or biased for / against certain fragments. Similarly, recovery of 
fragmented polypeptides by PET-specific capture agents may occasionally be 
incomplete and/or biased. Therefore, there may be certain risks associated with 
10 using one specific PET-specific capture agent for measurement of a target 
polypeptide. 

To overcome this potential problem, or at least to compensate for the above- 
described incomplete digestion / recovery problems, two or more PETs specific to 
the polypeptide of interest may be generated, and used on the same array of the 
15 instant invention, or used in the same set of competition assays to independently 
detect different PETs of the same polypeptide. The average measurement results 
obtained by using such redundant PET-specific capture agents should be much more 
accurate and reliable when compared to results obtained using single PET-specific 
capture agents. 

20 On the other hand, certain proteins may have different forms within the same 

biological sample. For example, proteins may be post-translationally modified on 
one or more specific positions. There are more than 100 different kinds of post- 
radiational modifications, with the most common ones being acetylation, 
amidation, deamidation, prenylation, formylation, glycosylation, hydroxylation, 

25 methylation, myristoylation, phosphorylation, ubiquitination, ribosylation and 
sulphation. For a specific type of modification, such as phosphorylation, a PET 
peptide phosphorylated at a site may not be recognized by a capture agent raised 
against the same but unphosphorylated PET pepetide. Therefore, by comparing the 
result of a first capture agent specific for un-modified PET peptide of a target 

30 protein (which represents unmodified target protein), with the result of a second 
capture agent specific for another PET within the same target protein (which does 
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not contain any phosphorylation sites and thus representing the total amount of the 
taget protein), one can determine the percentage of phosphorylated target protein 
within said sample. 

The same principle applies to all target proteins with different forms, 
5 including unprocessed / pre-form and processed / mature form in certain growth 
factors, cytokines, and proteases; alternative splicing forms; and all types of post- 
translational modifications. 

In certain embodiments, capture agents specific for different PETs of the 
same target protein need not be of the same category (e.g., one could be an antibody 
10 specific for PET1 , the other could be non-antibody binding protein for PET2, etc.) 

In other embodiments, the presence or absence of one or more PETs is 
indicative of certain functional states of the target protein. For example, some PETs 
may be only present in unprocessed forms of certain proteins (such as peptide 
hormones, growth factors, cytokines, etc.), but not present in the corresponding 

15 mature / processed forms of the same proteins. This usually arises from the situation 
where the processing site resides within the PETs. On the other hand, other PETs 
might be common to both precessed and unprocessed forms (e.g., do not contain any 
processing sites). If both types of PETs are used in the same array, or in the same 
competition assay, the abundance and ratio of processed / unprocessed target protein 

20 can be assessed. 

In other embodiments, due to the vastly improved overall accuracy of the 
measurement using multiple PET-specific capture agents, the invention is applicable 
to the detection of certain previously unsuitable biomarkers because they have low 
detectable level (such as 1-5 pM) which is easily obscured by background signals. 

25 For example, as described above, Punglia et al. (N. EngL J. Med. 349(4): 335-42, 
July, 2003) indicated that, in the standard PSA-based screening for prostate cancer, 
if the threshold PSA value for undergoing biopsy were set at 4.1 ng per milliliter, 82 
percent of cancers in younger men and 65 percent of cancers in older men would be 
missed. Thus a lower threshold level of PSA for recommending prostate biopsy, 

30 particularly in younger men, may improve the clinical value of the PSA test. 
However, at lower detection limits, background can become a significant issue. The 
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sensitivity / selectivity of the multiple PET-specific capture agent assay can be used 
to relaibly and accurately detect low levels of PSA. 

Similarly, due to the increased accuracy of measurements, small changes in 
concentration are more easily and reliably detected. Thus, the same method can also 
5 be used for other proteins previously unrecognized as disease biomarkers, by 
monitoring very small changes of protein levels very accurately. "Small changes" 
refers to a change in concentration of no more than about 50%, 40%, 30%, 20%, 
1 5%, 10%, 5%, 1% or less when comparing a disease sample with a normal / control 
sample. 

10 Accuracy of a measurement is usually defined by the degree of variation 

among individual measurements when compared to the true value, which can be 
reasonably accurately represented by the mean value of multiple independent 
measurements. The more accurate a method is, the closer a random measurement 
will be as compared to the mean value. A x% accuracy measurement means that x% 

15 of the measurements will be within one standardized deviation of the mean value. 
The method of the invention is usually at least about 70% accurate, preferably 80%, 
90% or more accurate. 

Detection of the presence and amount of the captured PET-containing 
polypeptide fragments can be effectuated using any of the methods described above 

20 that are generally applicable for detecting / quantitating the binding event. 

To reiterate, for example, for each primary capture agent on an array, a 
specific, detectable secondary capture agent might be generated to bind the PET- 
containing peptide to be captured by the primary capture agent. The secondary 
capture agent may be specific for a second PET sequence on the to be captured 

25 polypeptide analyte, or may be specific for a post-translational modification (such as 
phosphorylation) present on the to-be-captured polypeptide analyte. To facilitate 
detection / quantitation, the secondary capture agent may be labeled by a detectable 
moiety selected from: an enzyme, a fluorescent label, a stainable dye, a 
chemilumninescent compound, a colloidal particle, a radioactive isotope, a near- 

30 infrared dye, a DNA dendrimer, a water-soluble quantum dot, a latex bead, a 
selenium particle, or a europium nanoparticle. 
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Alternatively, the captured PET-containing polypeptide analytes may be 
detected directly using mass spectrometry, colorimetric resonant reflection using a 
SWS or SRVD biosensor, surface plasmon resonance (SPR), interferometry, 
gravimetry, ellipsometry, an evanascent wave device, resonance light scattering, 
5 reflectometry, a fluorescent polymer superquenching-based bioassay, or arrays of 
nanosensors comprising nanowires or nanotubes. 

Another aspect of the invention provides arrays comprising redundant 
capture agents specific for one or more target proteins within a sample. Such arrays 
are useful to carry out the methods described above (e.g. high accuracy functional 

10 measurement of the target proteins). In one embodiment, several different capture 
agents are arrayed to detect different PET-containing peptide fragment derived from 
the same target protein. In other embodiments, the array may be used to detect 
several different target proteins, at least some (but may be not all) of which may be 
detected more than once by using capture agents specific for different PETs of those 

15 target proteins. 

Another aspect of the invention provides a composition comprising a 
plurality of capture agents, wherein each of said capture agents recognizes and 
interacts with one PET of a target protein. The composition can be used in an array 
format in an array device as described above. 

20 

VIII. Other Aspects of the Invention 

In another aspect, the invention provides compositions comprising a plurality 
of isolated unique recognition sequences, wherein the unique recognition sequences 
are derived from at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% 95% or 
25 100% of an organism's proteome. In one embodiment, each of the unique 
recognition sequences is derived from a different protein. 

The present invention further provides methods for identifying and/or 
detecting a specific organism based on the organism's Proteome Epitope Tag. The 
methods include contacting a sample containing an organism of interest (e.g., a 
30 sample that has been fragmented using the methods described herein to generate a 
collection of peptides) with a collection of unique recognition sequences that 
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characterize, and/or that are unique to, the proteome of the organism. In one 
embodiment, the collection of unique recognition sequences that comprise the 
Proteome Epitope Tag are immobilized on an array. These methods can be used to, 
for example, distinguish a specific bacterium or virus from a pool of other bacteria 
5 or viruses. 

The unique recognition sequences of the present invention may also be used 
in a protein detection assay in which the unique recognition sequences are coupled 
to a plurality of capture agents that are attached to a support. The support is 
contacted with a sample of interest and, in the situation where the sample contains a 

10 protein that is recognized by one of the capture agents, the unique recognition 
sequence will be displaced from being bound to the capture agent. The unique 
recognition sequences may be labeled, e.g., fluorescently labeled, such that loss of 
signal from the support would indicate that the unique recognition sequence was 
displaced and that the sample contains a protein is recognized by one or more of the 

15 capture agents. 

The PETs of the present invention may also be used in therapeutic 
applications, e.g., to prevent or treat a disease in a subject. Specifically, the PETs 
may be used as vaccines to elicit a desired immune response in a subject, such as an 
immune response against a tumor cell, an infectious agent or a parasitic agent. In 

20 this embodiment of the invention, a PET is selected that is unique to or is over- 
represented in, for example, a tissue of interest, an infectious agent of interest or a 
parasitic agent of interest. A PET is administered to a subject using art known 
techniques, such as those described in, for example, U.S. Patent No. 5,925,362 and 
international publication Nos. WO 91/1 1465 and WO 95/24924, the contents of each 

25 of which are incorporated herein by reference. Briefly, the PET may be administered 
to a subject in a formulation designed to enhance the immune response. Suitable 
formulations include, but are not limited to, liposomes with or without additional 
adjuvants and/or cloning DNA encoding the PET into a viral or bacterial vector. The 
formulations, e.g., liposomal formulations, incorporating the PET may also include 

30 immune system adjuvants, including one or more of lipopolysaccharide (LPS), lipid 
A, muramyl dipeptide (MDP), glucan or certain cytokines, including interleukins, 
interferons, and colony stimulating factors, such as IL1, IL2, gamma interferon, and 
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GM-CSR 
EXAMPLES 

This invention is further illustrated by the following examples which should 
5 not be construed as limiting. The contents of all references, patents and published 
patent applications cited throughout this application, as well as the Figures are 
hereby incorporated by reference. 

EXAMPLE 1: IDENTIFICATION OF UNIQUE RECOGNITION 
EQUENCES WITHIN THE HUMAN PROTEOME 

10 As any one of the total 20 amino acids could be at one specific position of a 

peptide, the total possible combination for a tetramer (a peptide containing 4 amino 
acid residues) is 20 4 ; the total possible combination for a pentamer (a peptide 
containing 5 amino acid residues) is 20 5 and the total possible combination for a 
hexamer (a peptide containing 6 amino acid residues) is 20 6 . In order to identify 

15 unique recognition sequences within the human proteome, each possible tetramer, 
pentamer or hexamer was searched against the human proteome (total number: 
29,076; Source of human proteome: EBI Ensembl project release v 4.28.1 on Mar 
12, 2002, http://www.ensembl.orp/Homo sapiens/) . 

The results of this analysis, set forth below, indicate that using a pentamer as 
20 a unique recognition sequence, 80.6% (23,446 sequences) of the human proteome 
have their own unique recognition sequence(s). Using a hexamer as a unique 
recognition sequence, 89.7% of the human proteome have their own unique 
recognition sequence(s). In contrast, when a tetramer is used as a unique recognition 
sequence, only 2.4% of the human proteome have their own unique recognition 
25 sequence(s). 

Results and Data 

2.1. Tetramer analysis: 


2.1.1. Sequence space: 


Total number of human protein sequences 

29,076 

100% 

♦Number of sequences with 1 or more unique tetramer tag 

684 

2.4% 
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Number of sequences with 0 unique tetramer tag 


28,392 | 97.6% 


♦For these 684 sequences, average Tag/sequence: 1.1. 
2.1.2. Tag space: 


Total number of tetramers 

20"=1 60,000 

100% 

Tetramers found in 0 sequence 

393 

0.2% 

"Tetramers found in 1 sequence only 

745 

0.5% 

Tetramers found in more than 1 sequences 

158,862 

99.3% 


#: These are signature tetra-peptides 

2.2. Pentamer analysis: 

2.2. 1 . Sequence space: 


Total number of human protein sequences 

29,076 

100% 

♦Number of sequences with 1 or more unique pentamer tag 

23,446 

80.6% 

Number of sequences with 0 unique pentamer tag 

5,630 

19.4% 


♦For these 23,446 sequences, Average Tag/sequence: 23.9 
2.2.2. Tag space: 


Total number of pentamers 

20 5 =3,200,000 

100% 

Pentamers found in 0 sequence 

955,007 

29.8% 

"Pentamers found in 1 sequence only 

560,309 

17.5% 

Pentamers found in more than 1 sequences 

1,684,684 

52.6% 


#: These are signature penta-peptides 

2.3. Hexamer analysis: 

10 2.3.1. Sequence space: 


Total number of human protein sequences 

29,076 

100% 

♦Number of sequences with 1 or more unique hexamer tag 

26,069 

89.7% 

Number of sequences with 0 unique hexamer tag 

3,007 

10.3% 


♦For these 26069 sequences, Average Tag/sequence: 177 
2.3.2. Tag space: 


Total number of hexamers 

20 6 =64,000,000 

100% 

hexamers found in 0 sequence 

57,040,296 

89.1% 

" hexamers found in 1 sequence only 

4,609,172 

7.2% 

hexamers found in more than 1 sequences 

2,350,532 

3.7% 


#: These are signature hexa-peptides. 

Similar analysis in the human proteome was done for PET sequences of 7-10 
15 amino acids in length, and the results are combinedly summarized in the table 
below: 
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5 


PET Length 

Tagged Sequences 

Tagged Sequences 

Average PET 

(Amino Acids) 

(Number) 

(% of total - 29076) (Number/ Tagged 

Protein) 




4 


2.35% 

3 

5 

23,446 

80.64% 

24 

6 

26,069 

89.66% 

177 

7 

26,184 

90.05% 

254 

8 

26,216 

90.16% 

268 

9 

26,238 

90.24% 

272 

10 

26,250 

90.28% 

275 


EXAMPLE 2: IDENTIFICATION OF UNIQUE RECOGNITION 

SEQUENCES (OR PETS) WITHIN ALL BACTERIAL 
PROTEOMES 

15 In order to identify pentamer PETs that can be used to, for example, 

distinguish a specific bacterium from a pool of all other bacteria, each possible 
pentamer was searched against the NCBI database 
rhftp://www.ncbi.nlm.nih.gov/PMGifs/G Rnomes/eub g.html. updated as of April 10, 
2002). The results from this analysis are set forth below. 

20 Results and Data : 


Number of 

unique 

pentamers 

Database ID 

(NCBI RefSeq 
ID) 

Species Name 

6 

NC_000922 

Chlamydophila pneumoniae CWL029 

37 

NC_002745 

Staphylococcus aureus N315 chromosome 

40 

NC_001733 

Methanococcus jannaschii small extra- 
chromosomal element 

58 

NC_0024 91 

Chlamydophila pneumoniae J138 

64 

NC_002179 

Chlamydophila pneumoniae AR39 

135 

NC_000909 

Methanococcus jannaschii 
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206 

NC_003305 

Agrobacterium tumefaciens str. C58 (U. 
Washington) linear chromosome 

298 

NC_002758 

Staphylococcus aureus Mu50 chromosome 

356 

NCJJ02655 

Escherichia coli 0157:H7 EDL933 

386 

NC_003063 

Agrobacterium tumefaciens. str. C58 (Cereon) 
linear chromosome 

479 

NC_000962 

Mycobacterium tuberculosis 

481 

NC_002737 

Streptococcus pyogenes 

495 

NC_003304 

Agrobacterium tumefaciens str. C58 (U. 
Washington) circular chromosome 

551 

NC_003098 

Streptococcus pneumonia R6 

567 

NC_003485 

Streptococcus pyogenes MGAS8232 

577 

NC_002695 

Escherichia coli 0157 

592 

NC_003028 

Streptococcus pneumonia TIGR4 

702 

NC_O03O62 

Agrobacterium tumefaciens str. C58 (Cereon) 
circular chromosome 

729 

NC_001263 

Deinococcus radiodurans chromosome 1 

918 

NC_003116 

Neisseria meningitidis Z24 91 

924 

NC_0 00908 

Mycoplasma genital ium 

960 

NCJ)02755 

Mycobacterium tuberculosis CDC1551 

977 

NC_003112 

Neisseria meningitidis MC58 

979 

NCJ>00921 

Helicobacter pylori J99 

1015 

NC_000915 

Helicobacter pylori 26695 

1189 

NC_000963 

Rickettsia prowazekii 

1284 

NC_001318 

Borrelia burgdorferi chromosome 

1331 

NC_002771 

Mycoplasma pulmonis 

1426 

NCJD0 0912 

Mycoplasma pneumoniae 

1431 

NC_002528 

Buchnera sp. APS 

1463 

NC_000868 

Pyrococcus abyssi 
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1468 

NC_000117 

Chlamydia trachomatis 

1468 

NC_002162 

Ureaplasma urealyticum 

1478 

NC_003212 

Listeria innocua 

1553 

NC_003210 

Listeria monocytogenes 

1577 

NC_000961 

Pyrococcus horikoshii 

1630 

NC_002620 

Chlamydia muridarum 

1636 

NC_003103 

Rickettsia conorii Malish 7 

1769 

NC_003198 

Salmonella typhi 

1794 

NC_000913 

Escherichia coli K12 

1894 

NC_002689 

Thermoplasma volcanium 

1996 

NCJ)03413 

Pyrococcus furiosis 

2081 

NC_002578 

Thermoplasma acidophilum 

2106 

NC_003197 

Salmonella typhimurium LT2 

2137 

NC_O03317 

Brucella melitensis chromosome I 

2402 

NC_002677 

Mycobacterium leprae 

2735 

NC_000918 

Aquifex aeolicus 

2803 

NC_002505 

Vibrio cholerae chromosome 1 

2900 

NC_000907 

Haemophilus influenzae 

3000 

NC_003318 

Brucella melitensis chromosome II 

3120 

NC_000854 

Aeropyrum pernix 

3229 

NC_002662 

Lactococcus lactis 

3287 

NC_002607 

Halobacterium sp. NRC-1 

3298 

NC_003454 

Fusobacterium nucleatum 

3497 

NC_001732 

Methanococcus jannaschii large extra- 
chromosomal element 

3548 

NC_002163 

Campylobacter jejuni 

3551 

NC_000853 

Thermotoga maritima 

3688 

NC_003106 

Sulfolobus tokodaii 
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3775 

NC_002754 

Sulfolobus solfataricus 

3842 

NC_000919 

Treponema pallidum 

3921 

NC_003296 

Ralstonia solanacearum GMI1000 

3940 

NC_000916 

Methanobacterium thermoautotrophicum 

4165 

NC_001264 

Deinococcus radiodurans chromosome 2 

4271 

NC_003047 

Sinorhizobium meliloti 1021 chromosome 

4338 

NC_002663 

Pasteurella multocida 

4658 

NC_003364 

Pyrobaculum aerophilum 

5101 

NC_000917 

Archaeoglobus fulgidus 

5787 

NC_003366 

Clostridium perfringens 

5815 

NC_003450 

Corynebacterium glutamicum 

6520 

NC_002696 

Caulobacter crescentus 

6866 

NC_002506 

Vibrio cholerae chromosome 2 

6891 

NC_003295 

Ralstonia solanacearum chromosome 

7078 

NC_002488 

Xylella fastidiosa chromosome 

8283 

NC_003143 

Yersinia pestis chromosome 

8320 

NC_000911 

Synechocystis PCC6803 

8374 

NC_002570 

Bacillus halodurans 

8660 

NC_000964 

Bacillus subtilis 

8994 

NC_003030 

Clostridium acetobutylicum ATCC824 

11725 

NC_003552 

Methanosarcina acetivorans 

12120 

NC_002516 

Pseudomonas aeruginosa 

12469 

NC_002678 

Mesorhizobium loti 

14022 

NC_003272 

Nostoc sp. PCC 7120 


EXAMPLE 3: IDENTIFICATION OF SPECIFIC PETS 

Figure 1 1 outlines a general approach to identify all PETs of a given length 
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in an organism with sequenced genome or a sample with known proteome. Briefly, 
all protein sequences within a sequenced genome can be readily identified using 
routine bioinformatic tools. These protein sequences are parsed into short 
overlapping peptides of 4-10 amino acids in length, depending on the desired length 

5 of PET. For example, a protein of X amino acids gives (X-N+l) overlapping 
peptides of N amino acids in length. Theoretically, all possible peptide tags for a 
given length of, for example, N amino acids, can be represented as 20 N (preferably, 
N = 4-10). This is the so-called peptide tag database for this particular length (N) of 
peptide fragments. By comparing each and every sequence of the parsed short 

10 overlapping peptides with the peptide tag database, all PET (with one and only one 
occurrence in the peptide tag database) can be identified, while all non-PET (with 
more than one occurrence in the peptide tag database) can be eliminated. 

As indicated above, each possible tetramer, pentamer or hexamer was 
searched against the human proteome (total number: 29,076; Source of human 
15 proteome: EBI Ensembl project release 4.28.1 on Mar 12, 2002, 
http://www.ensembl.org/Homo_sapiens/) to identify unique recognition sequences 

(PETs). 

Based on the foregoing searches, specific PETs were identified for the 
majority of the human proteome. Figure 1 depicts the pentamer unique recognition 
20 sequences that were identified within the sequence of the Interleukin-8 receptor A. 
Figure 2 depicts the pentamer unique recognition sequences that were identified 
within the Histamine HI receptor that are not destroyed by trypsin digestion. Further 
Examples of pentamer unique recognition sequences that were identified within the 
human proteome are set forth below. 

25 


Sequence ID* 

Number 
of 

pentamer 
PETs 

Pentamer PETs 

ENSP0000000O233 

9 

AMPVS CATQG CFTVW ICFTV MPNAM PNAMP SRTWY 
TWYVQ WYVQA 

fSEQ ID NOs:l-9) 
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ENSP00000000412 

30 

CDFVC 

CGKEQ 

CWRIo 

Ul\ r IN c 



FYSCW 

GMEQF 

HLAFW 

IFNGS 

IMLIY 

IYIFR 

KGMEQ 

KTCDL 



MFPFY 

MISCN 


NWIML 

PFYSC 

QDCFY 

OFPHL 



RESWQ 

SNWIM 

VMISC 

YDNHC 

YIYIF 

YKGGD 

YLFEM 



YRGVG 

YSCWR 








(SEQ ID NOs: 

1 \J~Js ) 





ENSP00000000442 

2 

ASNEC PASNE 

(SEQ ID NOs: 

40-4 n 






9 

AQPWA 
YAQLW 

ASTWR 
YCCPV 

CLCLV 

FVICA 

LYCCP 

PRANR 

VNVLC 



(SEQ ID NOs: 

42-50) 





t—V TOnAAAAAAA 1 AAO 

ENSP0000000 1 UUo 

on 
zu 

AIQRM 

AKPNE 

AMCHL 

AWDIA 

CQQRI 

ELKYE 

EMPMI 

FVHYT 

HSIVY 

HYTGW 

LYANM 

MIGDR 

QKSNT 

SWEMN 



SWLEY 

TEMPM 

WEMNS 

YAKPN 

YESSF 

YPNNK 




(SEQ ID NOs: 

51-70) 





FNSP00000001 146 

32 

ATRDK 

CPCEG 

DKSCK 

DTHDT 

EWPRS 

FEVYQ 

FQIPK 

FSGYR 

GCPCE 

GHLFE 

HDTAP 

IFSHE 

KEM1M 

KxiUU 1 



KSCKL 

KYGNV 

LKHPT 

MGEHH 

MTMQE 

MYSIR 

NVFDP 



QLWQL 

RGIQA 

RYLDC 

STEWP 

THDTA 

mn m T"~i T% 

TRTFP 

VMYSI 



VRTCL 

VSTEW 

WQLRW 

WSVMY 






(SEQ ID NOs: 

71-102) 




ENSP00000001178 

8 

ACKCF 

CKCFW 

FWLWY 

KCFWL 

LWYPH 

QKRRC 

1,7 T T»TVD 

WLiWYr 

WYPHF 







(SEQ ID NOs 

103-110) 




ENSP0000000 1 380 

26 

AMEQT 

APCTI 

AYMER 

CTIMK 

DGLCN 

EQTWR 

FRSYG 

GMAYM 

GYHMP 

HIPNY 

KGRIP 

KLDMG 

MAYME 

MEQTW 



MNKRE 

PGMNK 

QGYHM 

TMSPK 

TWRLD 

VEQGY 

VNDGL 



WDQTR 

WRLDP 

YEAME 

YHMPC 

YNPCQ 





(SEQ ID NOs 

111-136) 




ENSP00000001567 

137 

ATYYK 

CATYY 

CDNPY 

CEWK 

CIKTD 

CINSR 

CKSPD 

CKSSN 

CNELP 

CQENY 

CSESF 

CYERE 

CYHFG 

CYMGK 



DFTWF 

DGWSA 

DIPIC 

DQTYP 

DREYH 

EEMHC 

EFDHN 



EFNCS 

EHGWA 

EINYR 

EKIPC 

EMHCS 

ESNTG 

ESTCG 



ESYAH 

EYHFG 

EYYCN 

FENAI 

FQYKC 

FTWFK 

GEWVA 



GNVFE 

GWTND 

HGRKF 

HGTIN 

HGWAQ 

HPGYA 

HPPSC 



HTVCI 

IHGVW 

IKHRT 

IMVCR 

INGRW 

IPCSQ 

IPVFM 



IVCGY 

IYKCR 

IYKEN 

KCNMG 

KGEWV 

KIPCS 

KPCDY 



KWSHP 

LPICY 

MENGW 

MGKWS 

MGYEY 

MIGHR 

NCSMA 



NDFTW 

NEGYQ 

NETTC 

NGWSD 

NMGYE 

NQNHG 

NSVQC 



NVFEY 

NYRDG 

NYREC 

PCDYP 

PEVNC 

PICYE 

PPQCE 



PPYYY 

PQCVA 

PYIPN 

QCYHF 

QIQLC 

QYKVG 

RDTSC 



REYHF 

RIKHR 

RKGEW 

RPCGH 

RVRYQ 

RWQSI 

SCDNP 



SDQTY 

SFTMI 

SITCG 

SRWTG 

STGWI 

SVEFN 

SWSDQ 



TAKCT 

TCIHG 

TCINS 

TCMEN 

TCYMG 

TMIGH 

TNDIP 



TSTGW 

TWFKL 

TYKCF 

VAIDK 

VCGYN 

VEFNC 

VFEYG 



VIMVC 

VNCSM 

VTYKC 

WDHIH 

WFKLN 

WIHTV 

WQSIP 



WSDQT 

WTNDI 

YCNPR 

YHENM 

YHFGQ 

YKCFE 

YKCNM 



YKCRP 

YKIEG 

YMGKW 

YNGWS 

YNQNH 

YPDIK 

YQCRN 



YQYGE 

YSERG 

YWDHI 

YYKMD 






(SEQ ID NOs 

: 137-274) 




ENSP00000001585 

25 

CVSKG 

EIIII 

GIN YE 

GMKHA 

GWDLK 

HGMKH 

HHPKF 

IEKCV 

IIMDA 

I NYE I 

KGYVF 

MEMIV 

MIVRA 

NYTIG 



QMEMI 

SHHPK 

TGSFR 

TRY KG 

VYGWD 

YGESK 

YGWDL 



YIHGM 

YNERE 

YTIGE 

YVFQM 






(SEQl 

[D NOs 

:275-299) 
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ENSP00000002125 

7 

GRYQR KNMGI 

(SEQ ID NOs 

MGERF PIKQH 

299-306) 

QRNAR 

RYQRN 

YDMLM 

ENSP00000002 1 65 

oi 

AHSAT 
EYSWN 

AKFFN 
FDQAK 

CKWGW 
FEWFH 

CMTID 
FNANQ 

DKLSW 
FWWYW 

DQAKF 
FYTCS 

DVWYT 
HKWEN 



HPKAI 

HQMPC 

HTWRS 

IHQMP 

IPKYV 

IYETH 

KFFNA 



KWENC 

KWGWA 

KWPTS 

LMNIG 

LPHKW 

MPCKW 

MRPQE 



NANQW 

NCMTI 

NYPPS 

NYQPE 

PCKWG 

PDQYW 

PHKWE 



QMGSW 

QYWNS 

RNRTD 

SCGGN 

SKHHE 

TCSDR 

THTWR 



TIHQM 

TNDRW 

TPDVW 

TRFDP 

TWTN 

VRGTV 

WTND 



WENCM 

WFDQA 

WFWWY 

WGSEY 

WGWAL 

WNWNA 

WRSQN 



WWYWQ 

YEDFG 

YETHT 

YNPGH 

YSWNW 

YVEFM 

YYSLF 



(SEQ ID NOs 

307-369) 




ENSP00000002494 

~1 A 

74 

AMNDA 

ANHGE 

AQWRN 

CVKLP 

CVQYK 

DAHKR 

DCVQY 

DIEQR 

DMAER 

DPDKW 

DTANH 

EVSFM 

EYVID 

FEQYE 



FFEQY 

FGDCV 

FMNET 

HEIYR 

HERFL 

HFDQT 

HKQWK 



HKRAF 

HTAMN 

HWIQQ 

KHFDQ 

KMLNQ 

KQMTS 

KQYAQ 



KRAFH 

KWERF 

LNGRW 

LPHWI 

MFATM 

MKFMN 

MKMEF 



MLNQS 

MPQEG 

MYVKA 

NLPHW 

NTDAH 

NVLKH 

PHWIQ 



PVMDA 

QADEM 

QENCK 

QHTAM 

\JN 1 VO 

ywivuij 




RVPVM 

SFYDS 

SHERF 

TCDEM 

TDAHK 

TKLMP 

TWRY 



TYQIL 

VMDAQ 

VMKFM 

VPVMD 

17DVT 17 

VRlijr 

17CCMKT 

vor rllM 

WUKlu 



WERFE 

WIIKY 

WIQQH 

WISTN 

WrvJJ X 1 

rtrsJ\Jl V 




YEVTY 

YGRRE 

YTDCV 

YVKAD 






(SEQ ID NOs: 

369-443) 




ENSP00000002594 

7 

CFKEN 

DGGFD 

FDLGD 

KLCFK 

KPMPN 

MPNPN 

PNPNH 


(SEQ ID NOs. 

444-450) 




FN9P00000009 SQfi 


DRCLH 

EEHYS 

EHYSH 

ENEVH 

EYFHE 

FFDWE 

FHEPN 

FSWPH 

FYNHM 

GRDRC 

GVAPN 

HEYFH 

HFFDW 

HIVDG 



HKPYP 

HMQKH 

HMQNW 

HPQVD 

HVHMQ 

KGRAH 

KHKPY 



KTPAY 

MQNWL 

NHMQK 

QKHKP 

QNWLR 

RVYSM 

SMNPS 



SWPHQ 

TFDWH 

TQVFY 

WEEHY 

YCLRD 

YHVHM 

YNHMQ 



YPSIE 









(SEQ ID NOs: 

451-486) 




PVTOTlAAAnAAAl OIA 

oU 

ADIRM 

AWPSF 

CLVNK 

CQAYG 

CTYVN 

DHDRM 

DPSFI 

DRMYV 

GHCCL 

GIETH 

GYWRH 

HCCLV 

HDINR 

HDRMY 



HQYCQ 

HRCQA 

IETHF 

IFYLE 

IHQYC 

IIHWA 

INFMR 



IQPWN 

KMPYP 

KWLFQ 

LIIHW 

LIQPW 

MCTYV 

MPYPR 



MRSHP 

NNFKH 

NPIRQ 

NSRWL 

NTTDY 

NYQWM 

PIRQC 



PRNRR 

PVKTM 

PWNRT 

QDYIF 

QGYWR 

QTAMR 

RCQAY 



RMVFN 

SKDYV 

SNANK 

TGAWP 

VGVTH 

VINFM 

VKWLF 



WDGQA 

WPSFP 

WRHVP 

YAGVY 

YCQGY 

YNPMC 

YNSRW 



YPLQR 

YQAVY 

YQWMP 

YWRHV 






(SEQ ID NOs: 

487-546) 





The Sequence IDs used are the ones provided in http://www.ensembl.org/Homo sapiens/ 


Figure 12 lists the results of searching the whole human proteome (a total of 
29,076 proteins, which correspond to about 12 million 4-10 overlapping peptides) 
for PETs, and the number of PETs identified for each N between 4-10. 

5 Figure 13 shows the result of percentage of human proteins that have at least 

one PET(s). It is shown that for a PET of 4 amino acids in length, only 684 (or about 
2.35% of the total human proteins) proteins have at least one 4-mer PETs. However, 
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if PETs of at least 6 amino acids are used, at least about 90% of all proteins have at 
least one PET. In addition, it is somewhat surprising that there is a significant 
increase in average number of PETs per protein from 5-mer PETs to 6-mer (or 
more) PETs (see lower panel of Figure 13), and that average quickly reaches a 

5 platue when 7- or 8-mer PETs are used. These data indicates that PETs of at least 6 
amino acids, preferably 7-9 amino acids, most preferably 8 amino acids have the 
optimal length of PETs for most applications. It is easier to identify a useful PET of 
that length, partly because of the large average number of PETs per protein when a 
PET of that length is sought. 

10 Figure 14 provides further data resulting from tryptic digest of the human 

proteome. Specifically, the top panel lists the average number of PETs per tagged 
protein (protein with at least one PETs), with or without trypsin digestion. Trypsin 
digestion reduces the average number of PETs per tagged protein by roughly 1/3 to 
1/2. The bottom right panel shows the distribution of tryptic fragments in the human 

1 5 proteome, listed according to peptide length. On average, a typical tryptic fragment 
is about 8.5 amino acids in length. The bottom left panel shows the distribution of 
number of tryptic fragments generated from human proteins. On average, a human 
protein has about 49 tryptic fragments. 


Example 6 below provides a detailed example of identifying SARS virus- 
specific 8-mer PETs. These PETs are potentially useful as SARS-specific antigens 
for immunization (vaccine production) in human or other mammals. 

EXAMPLE 4: DETECTION AND QUANTITATION IN A COMPLEX 
MIXTURE OF A SINGLE PEPTIDE SEQUENCE WITH 

TWO NON-OVERLAPPING PET SEQUENCES USING 
SANDWICH ELISA ASSAY 
A fluorescence sandwich immunoassay for specific capture and quantitation 
of a targeted peptide in a complex peptide mixture is illustrated herein. 

In the example shown here, a peptide consisting of three commonly used 
affinity epitope sequences (the HA tag, the FLAG tag and the MYC tag) is mixed 
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with a large excess of unrelated peptides from digested human protein samples 
(Figure 15). The FLAG epitope in the middle of the target peptide is first captured 
here by the FLAG antibody, then the labeled antibody (either HA mAb or MYC 
mAb) is used to detect the second epitope. The final signal is detected by 
5 fluorescence readout from the secondary antibody. Figure 15 shows that picomolar 
concentrations of HA-FLAG-MYC peptide was detected in the presence of a billion 
molar excess of digested unrelated proteins. The detection limit of this method is 
typically about 10 pM or less. 

The sandwich assay was used to detect a tagged-human PSA protein, both as 

10 full length protein secreted in conditioned media of cell cultures, and as tryptic 
peptides generated by digesting the same conditioned media. The result of this 
analysis is shown in Figure 16. The PSA protein sandwich assay (left side of the 
figure) indicated that the PSA protein concentration is about 7.4 nM in conditioned 
media. SDS-PAGE analysis indicated that the tryptic digestion of all proteins in the 

15 sample was complete, manifested by the absence of any visible bands on the gel 
after digestion since most tryptic fragments are expected to be less than 1 kDa. The 
right side of the figure indicated that nearly the same concentration (8 nM) of the 
last fragment - the tag-containing portion of the recombinant PSA protein was 
present in the digested sample. The higher concentration could be attributed to the 

20 elimination of interfering substances in the sample, such as other proteins that bind 
the full-length PSA protein and mask its interaction with the antibody. Although this 
type of interference is not so severe in this example since the relatively simple 
conditioned media was used, it is expected to be much more prevalent in real 
biological samples, where large interference is expected from unknown proteins in a 

25 non-digested and complicated bodily fluid such as serum. 

The same sandwich assay may be used for detecting modified amino acids, 
such as phosphorylated proteins using anti-tyrosine, anti-serine, or anti-threonine 
antibodies. For example, Figure 17 shows that the phopshoprotein SHIP-2 contains a 
28-amino acid tryptic fragment, which is phosphorylated on one tyrosine residue N- 
30 terminal to an 8-mer PET (YVLEGVPH) and on one serine residue C-terminal to the 
PET. Thus in the sandwich assay, the trypsin digested SHIP-2 protein can first be 
pulled-down using the PET-specific antibody, and the presence of phosphorylated 
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tyrosine or serine may be detected / quantitated using the phospho-specific 
antibodies, such as those described elsewhere in the instant specification. Three of 
the nearest neighbors of the selected PET are also shown in the figure. 

Similarly, the phosphoprotein ABL also contains an 8-mer PET on its tryptic 
5 fragment containing the phosphorylation site. The phosphorylated peptide is readily 
detectable by a phospho-tyrosine-specific antibody. 

In fact, as a general approach, the sandwich assay may be used to detect N 
proteins with N+l PET-specific antibodies: one PET is common to all N peptides to 
be detected, while each specific peptide also contains a unique PET. All N peptides 
10 can be pulled-down by a capture agent specific to the common PET, and the 
presence and quantity of each specific peptide can be individually assessed using 
antibodies specific to the unique PETs (see Figure 18). 

To illustrate, most kinases are somehow related by sharing similar catalytic 
structures and/or catalytic mechanisms. Thus, it is interesting that only 88 5-mer 

15 PETs are needed to represent all known 518 human kinases, and 122 6-mer PETs are 
needed for the same purpose. Figure 18 also shows that the top 20 most common 6- 
mer PETs cover more than 70% of all known kinases. Since closely related kinases 
tend to share common features, the subject sandwich assay is suitable for 
simultaneous detection of family of kinases. Figure 19 provides such an example, 

20 wherein one 5-mer PET is shared among tryptic fragments of 22 related kinases, 
each of which also has unique 7-mer or 8-mer PETs. 

The same approach may be used for other protein families, including 
GPCRs, proteases, phosphotases, receptors, or specific enzymes. The Human 
Plasma Membrane Receptome is disclosed at http://receptome.stanford.edu/HPMR. 

25 

EXAMPLE 5: PEPTIDE COMPETITION ASSAY 

In certain embodiments of the invention, a peptide competition assay may be 
used to determine the binding specificity of a capture agent towards its target PET, 
as compared to several nearest neighbor sequences of the PET. 

30 For a typical peptide competition assay, the following illustrative protocol 
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may be used: 1 ng/100 nl/well of each target peptide is coated in Maxisorb Plates 
with coating buffer (carbonate buffer, pH 9.6) overnight at 4°C, or 1 hour at room 
temperature. The plates are washed with 300 )al of PBST (1 x PBS / 0.05% tween 
20) for 4 times. Then 300 ^1 of blocking buffer (2% BSA / PBST) is added and the 
5 plates are incubated for 1 hour at room temperature. Following blocking, the plates 
are washed with 300 fxl of PBST for 4 times. 

Synthesized competition peptides are dissolved in water to a final 
concentration of 2 mM solution. Serial dilution of competition peptides (for 
example, from 100 pM to 100 |iM) in digested human serum are prepared. These 

10 competition peptides at particular concentrations are then mixed with equal amounts 
of primary antibodies against the target peptide. These mixtures are then added to 
plate wells with immobilized target peptides respectively. Binding is allowed to 
proceed for 2 hours at room temperature. The plates are washed with 300 |xl of 
PBST for 4 times. Then labeled secondary antibody against the primary antibody, 

15 such as 100 ^1 of 5,000 x diluted anti-rabbit-IgG-HRP, is added and incubated for 1 
more hour at room temperature. The plates are washed with 300 \i\ of PBST for 6 
times. For detection of the HRP label activity, add 100 ^il of TMB substrate (for 
HRP) and incubate for 15 minutes at room temperature. Add 100 ^1 of stop buffer 
(2N HCL) and read the plates at OD 45 o. A peptide competition curve is plotted using 

20 the ABS at OD 450 versus the competitor peptide concentrations. 

EXAMPLE 6: IDENTIFICATION OF SARS-SPECIFIC PETS 

Se quence Retrieval 

A total of 2028 Corona virus peptide sequences were obtained from the NCBI 
25 database (http://www.ncbi.nlm.nih.gov:80/genomes/SARS/SARS.html). These 
sequences represent at least 10 different species of Coronavirus. Among them, 1098 
non-redundant peptide sequences were identified. Each sequence that appeared 
identically within (was subsumed in) a larger sequence was removed, leaving the 
larger sequence as the representative. The resulting sequences were then broken up 
30 into overlapping regions of eight amino acids (8-mers), with a sequence difference 
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of 1 amino acid between successive 8-mers. These 8-mers were then queried against 
a database consisting of all 8-mers similarly generated and present in the proteome 
of the species in question (or any other set of protein sequences deemed necessary). 
8-mers found to be present only once (the sequence identified only itself) were 

5 considered unique. The remainder of the sequences were initially classified as non- 
unique with the understanding that with more in-depth analysis, they might actually 
be as useful as those sequences initially determined to be unique. For example, an 8- 
mer may be present in another isoform of its parent sequence, so it would still be 
useful in uniquely detecting that parental sequence and that isoform from all other 

10 unrelated proteins. 

A total of -650,000 8-mer peptide sequences were generated, -50,000 of 
which were determined to be PETs. Among these, 605 were SARS-specific and 602 
were PETs relative to human. 

PET Prioritization: 

15 Once PETs have been identified, the best candidates for a particular 

application must be chosen from the pool of all PETs. 

Generally, PETs are ranked based upon calculations used to predict their 
hydrophobicity, antigenicity, and solubility, with hydrophilic, antigenic, and soluble 
PETs given the highest priority. The PETs are then further ranked by determining 

20 each PET's closest nearest neighbors (similar looking 8-mers with at least one 
sequence difference(s)) in the proteome(s) in question. A matrix calculation is 
performed using a BLOSUM, PAM, or a similar proprietary matrix to determine 
sequence similarity and distance. PETs with the most distant nearest neighbors are 
given the priority. 

25 The parental peptide sequence is then proteolytically cleaved in silico and the 

resulting fragments sorted by user-defined size / hydrophobicity / antigenicity / 
solubility criteria. The presence of PETs in each fragment is assessed, and fragments 
containing no PETs are discarded. The remaining fragments are analyzed in terms of 
PET placement within them depending upon the requirements of the type of assay to 

30 be performed. For example, a sandwich assay prefers two non-overlapping PETs in 
a single fragment. The ideal final choice would be the most antigenic PETs with 
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only distantly-related nearest neighbors in an acceptable proteolytic fragment that fit 
the requirements of the assay to be performed. 

Figure 20 shows two SARS-specific PETs and their nearest neighbors in 
both the human proteome and the related Coronaviruses. 
5 All SARS-specific PETs identified using this method is listed below in Table 

SARS. 

Table SARS List of SARS virus-specific PETs 

>gi|30795153|gb|AAP41045.1| OrflO [SARS coronavirus Tor2] ISLCSCIC 
>gi|30795 1 53 |gb|AAP4 1 045 . 1 1 Orfl 0 [SARS coronavirus Tor2] SLCSCICT 

10 >gi|3079 5 1 5 3 |gb| A AP4 1 045 . 1 1 OrflO [SARS coronavirus Tor2] LCSCICTV 
>gi|30795 1 53 |gb|A AP4 1 045 . 1 1 Orfl 0 [SARS coronavirus Tor2] CSCICTW 
>gi|30795 1 5 3 |gb| A AP4 1 045. 1 1 OrflO [SARS coronavirus Tor2] SCICTWQ 
>gi|30795 1 53 |gb|AAP4 1 045 . 1 j Orfl 0 [SARS coronavirus Tor2] CICTWQR 
>gi|30795153|gb|AAP41045.1| OrflO [SARS coronavirus Tor2] ICTVVQRC 

1 5 >gi|30795 1 53|gb|AAP4 1045. 1 1 OrflO [SARS coronavirus Tor2] CTVVQRCA 
>gi|30795153|gb|AAP41045.1| OrflO [SARS coronavirus Tor2] HVLEDPCK 
>gi|30795153|gb|AAP41045.1| OrflO [SARS coronavirus Tor2] VLEDPCKV 
>gi|30795153|gb|AAP41045.1| OrflO [SARS coronavirus Tor2] LEDPCKVQ 
>gi|30795153|gb|AAP4 1045.1 1 OrflO [SARS coronavirus Tor2] EDPCKVQH 

20 >gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] MNELTLID 
>gi|32187352|gb|AAP72981.1|Orf7b [SARS coronavirus HSR 1] NELTLIDF 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] ELTLIDFY 
>gi|32187352|gb| AAP72981.1| Orf7b [SARS coronavirus HSR 1] LTLIDFYL 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] TLIDFYLC 

25 >gi|32 1 87352|gb|AAP7298 1 . 1 1 Orf7b [SARS coronavirus HSR 1 ] LIDFYLCF 
>gi|321 873 52|gb|AAP7298 1 . 1 1 Orf7b [SARS coronavirus HSR 1] IDFYLCFL 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] DFYLCFLA 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] FYLCFLAF 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] YLCFLAFL 

30 >gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] LCFLAFLL 
>gi|32187352|gb|AAP72981.1|Orf7b [SARS coronavirus HSR 1] CFLAFLLF 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] FLAFLLFL 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] LAFLLFLV 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] AFLLFLVL 

35 >gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] FLLFLVLI 
>gi|32 1 873 52|gb| AAP7298 1 . 1 j Orf7b [SARS coronavirus HSR 1] LLFLVLIM 
>gi|32187352|gb| A AP72981.1| Orf7b [SARS coronavirus HSR 1] LFLVLIML 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] FLVLIMLI 
>gi|32187352|gb|AAP72981.1|Orf7b [SARS coronavirus HSR 1] LVLIMLII 

40 >gi|32 1 87352|gb|A AP7298 1 . 1 1 Orf7b [SARS coronavirus HSR 1] VLIMLI1F 
>gi|32187352|gb|AAP72981.1|Orf7b [SARS coronavirus HSR 1] LIMLIIFW 
>gi|32187352|gb|AAP72981.1|Orf7b [SARS coronavirus HSR 1] IMLIIFWF 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] MLIIFWFS 
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LIIFWFSL 

IIFWFSLE 

IFWFSLEI 

FWFSLEIQ 

WFSLEIQD 

FSLEIQDL 

SLEIQDLE 

LEIQDLEE 

EIQDLEEP 

IQDLEEPC 

QDLEEPCT 

DLEEPCTK 

LEEPCTKV 


>gi|32 1 87352|gb|AAP7298 1 . 1 | Orf7b [SARS coronavirus HSR 1 ] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
5 >gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|321 873 52|gb| AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 

10 >gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|32187352|gb|AAP72981.1| Orf7b [SARS coronavirus HSR 1] 
>gi|321 87350|gb|AAP72979.1| Orf6 [SARS coronavirus HSR 1] DEEPMELB 

1 5 >gi|321 87350|gb|AAP72979.1 1 Orf6 [SARS coronavirus HSR 1] EEPMELBY 
>gi|32187350|gb|AAP72979.1| Orf6 [SARS coronavirus HSR 1] EPMELBYP 
>gi|30023959|gb|AAP13572.1| unknown [SARS coronavirus CUHK-W1] 
DEEPMELD 

>gi|30023959|gb|AAP13572.1 1 unknown [SARS coronavirus CUHK-W1] 
20 EEPMELDY 

>gi|30023959|gb|AAPl 3572. 1 1 unknown [SARS coronavirus CUHK-W1] 
EPMELDYP 

>gi|30275674|gb|AAP30035.1 1 putative uncharacterized protein 3 [SARS 
coronavirus BJ0 1 ] SELDDEEL 
25 >gi|30275674|gb|AAP30035. 1 1 putative uncharacterized protein 3 [SARS 
coronavirus BJ01] ELDDEELM 

>gi|30275674|gb|AAP30035.1 1 putative uncharacterized protein 3 [SARS 
coronavirus BJ01] LDDEELME 

>gi|30275674|gb|AAP30035. 1 1 putative uncharacterized protein 3 [SARS 
30 coronavirus BJ01] DDEELMEL 

>gi|30275674|gb|AAP30035. 1 1 putative uncharacterized protein 3 [SARS 
coronavirus BJ01] DEELMELD 

>gi|30275674|gb|AAP30035.1 1 putative uncharacterized protein 3 [SARS 
coronavirus B JO 1] EELMELDY 
35 >gi|30275674|gb|AAP30035.1 1 putative uncharacterized protein 3 [SARS 
coronavirus BJ01] ELMELDYP 

>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] MLPPCYNF 

>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 
40 HZ01] LPPCYNFL 

>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] PPCYNFLK 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] PCYNFLKE 

45 >gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] CYNFLKEQ 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] YNFLKEQH 
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10 


15 


20 


25 


30 


35 


40 


45 


>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 

>gi|3 
HZ01 


747859|gb|AAP69660.1 

NFLKEQHC 

747859|gb|AAP69660.i: 

FLKEQHCQ 

747859|gb|AAP69660.1 

LKEQHCQK 

747859|gb|AAP69660.1 

KEQHCQKA 

747859|gb|AAP69660.1 

EQHCQKAS 

747859|gb|AAP69660.1 

QHCQKAST 

747859|gb|AAP69660.1 

HCQKASTQ 

747859|gb|AAP69660.1 

CQKASTQR 

747859|gb|AAP69660.1 

QKASTQRE 

747859|gb|AAP69660.1 

KASTQREA 

747859|gb|AAP69660.1 

ASTQREAE 

747859|gb|AAP69660.1 

STQREAEA 

747859|gb|AAP69660.1 

TQREAEAA 

747859|gb|AAP69660.1 

QREAEAAV 

747859|gb|AAP69660.1 

REAEAAVK 

747859|gb|AAP69660.1 

EAEAAVKP 

747859|gb|AAP69660.1 

AEAAVKPL 

747859|gb|AAP69660.1 

EAAVKPLL 

747859|gb|AAP69660.1 

AAVKPLLA 

747859|gb|AAP69660.1 

AVKPLLAP 

747859|gb|AAP69660.1 

VKPLLAPH 

747859|gb|AAP69660.1 

KPLLAPHH 

747859|gb|AAP69660.1 

PLLAPHHV 

747859|gb|AAP69660.1 

LLAPHHW 


uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
uncharacterized protein 9c [SARS coronavirus ZJ- 
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>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] LAPHHWA 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01]APHHWAV . A fPADO . „. 

5 >gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] PHHVVAV1 . 
>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] HHVVAVIQ 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

10 HZ01] HWAVIQE 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] WAVIQEI 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] VAVIQEIQ 

1 5 >gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] AV1QEIQL 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] VIQEIQLL 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

20 HZ01] IQEIQLLA 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] QEIQLLAA 

>gi|3 1 747859|gb|A AP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] EIQLLAAV 

25 >gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] IQLLAAVG 

>gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] QLLAAVGE , 
>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

30 HZ01] LLAAVGEI . 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] LAAVGEIL 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] AAVGEILL f 
35 >gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] AVGEILLL „ T 
>gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] VGEILLLE 

>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 

40 HZ01] GEILLLEW 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] E1LLLEWL 

>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] ILLLEWLA 

45 >gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] LLLEWLAE _ 
>gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] LLEWLAEV 
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>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] LEWLAEW 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] EWLAEWK 

5 >gi|3 1 747859|gb|AAP69660. 1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] WLAEVVKL 

>gi|31747859|gb|AAP69660.1| uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] LAEWKLP 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

10 HZ01] AEWKLPS 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 

HZ01] EVVKLPSR 

>gi|3 1747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] VVKLPSRY 

15 >gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] VKLPSRYC 

>gi|31747859|gb|AAP69660.1 1 uncharacterized protein 9c [SARS coronavirus ZJ- 
HZ01] KLPSRYCC 

>gi|3 141 6298|gb| AAP51230.1| envelope protein E [SARS coronavirus GZU1J 

20 VLLFLAFM . ^ aii 

>gi|31416298|gb|AAP5 1230.1 1 envelope protein E [SARS coronavirus GZ01] 

LLFLAFMV 

>gi|3 1 4 1 6298|gb|A AP5 1 230. 1 1 envelope protein E [SARS coronavirus GZ01] 
LFLAFMVF 

25 >gi|3 1 4 1 6298|gb|A AP5 1 230. 1 1 envelope protein E [SARS coronavirus GZ0 1 J 
FLAFMVFL 

>gi|31416298|gb|AAP5 1230.1 1 envelope protein E [SARS coronavirus GZ01J 
LAFMVFLL 

>gi|31416298|gb|AAP5 1230.1 1 envelope protein E [SARS coronavirus GZ01] 
30 AFMVFLLV . 

>gi|31416298|gb|AAP5 1230.1 1 envelope protein E [SARS coronavirus GZ01] 

FMVFLLVT . 
>gi|3 141 6298|gb|AAP51 230.1| envelope protein E [SARS coronavirus GZ01] 

MVFLLVTL 

35 >gi|29836499|reflNP_828854. 1 1 small envelope protein; protein sM; protein E 
[SARS coronavirus] VLLFLAFV 

>gi|29836499|reflNP_828854.1| small envelope protein; protein sM; protein E 
[SARS coronavirus] LLFLAFVV 

>gi|29836499|ref]NP_828854.1| small envelope protein; protein sM; protein E 
40 [SARS coronavirus] LFLAFVVF 

>gi|29836499|ref|NP_828854.1 1 small envelope protein; protein sM; protein b 
[SARS coronavirus] FLAFWFL 

>gi|29836499|reflNP_828854.1| small envelope protein; protein sM; protein E 
[SARS coronavirus] LAFWFLL 
45 >gi|29836499|reflNP_828854. 1 1 small envelope protein; protein sM; protein E 
[SARS coronavirus] AFWFLLV 

>gi|29836499|reflNP_828854.1| small envelope protein; protein sM; protein E 
[SARS coronavirus] FWFLLVT 
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KILVRYNT 
ILVRYNTR 
LVRYNTRG 
VRYNTRGN 
TAAFRDVL 
AAFRDVLV 
AFRDVLW 
FRDVLWL 
RDVLVVLN 
DVLVVLNK 
VLVVLNKR 
LVVLNKRT 


>gi|29836499|ref)NP_828854.1| small envelope protein; protein sM; protein E 
[SARS coronavirus] WFLLVTL 

>gi|32 1 87354|gb|A AP72983. 1 1 OrfSb [SARS coronavirus HSR 1] MCLKILVR 
>gi|32187354|gb|AAP72983.1| OrfSb [SARS coronavirus HSR 1] CLK1LVRY 

5 >gi|32187354|gb|AAP72983.1| Orf8b [SARS coronavirus HSR 1] LKILVRYN 
>gi|32187354|gb|AAP72983.1| OrfSb [SARS coronavirus HSR 1 
>gi|32187354|gb|AAP72983.1| OrfSb [SARS coronavirus HSR 1 
>gi|32187354|gb|AAP72983.1| Orf8b [SARS coronavirus HSR 1 
>gi|32187354|gb|AAP72983.1| OrfSb [SARS coronavirus HSR 1 

10 >gi|32187354|gb|AAP72983.1j-OrfSb [SARS coronavirus HSR 1 
>gi|32187354|gb|AAP72983.1| OrfSb [SARS coronavirus HSR 1 
>gi|32187354|gb|AAP72983.1| OrfSb [SARS coronavirus HSR 1 
>gi|32187354|gb|AAP72983.1| OrfBb [SARS coronavirus HSR 1 
>gi|32 1 873 54|gb|A AP72983 . 1 1 OrfSb [SARS coronavirus HSR 1 

1 5 >gi|321 87354|gb|AAP72983 . 1 1 Orf8b [SARS coronavirus HSR 1 
>gi|32 1 87354|gb|A AP72983 . 1 1 Orf8b [SARS coronavirus HSR 1, 
>gi|32187354|gb|AAP72983.1| OrfSb [SARS coronavirus HSR 1] L\ 
>gi|3 141 6303|gb|AAP5 1235.1 1 BGI-PUP7 [SARS coronavirus GZ01] 
MDPNQTNV 

20 >gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
DPNQTNW 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
PNQTNWP 

>gi|3 14 1 6303|gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
25 NQTNWPP 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 

QTNVVPPA 

>gi|3 14 1 6303|gb|AAP5 1 235 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
TNWPPAL 

30 >gi|3 14 16303|gb(AAP51 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
NWPPALH 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
WPPALHL 

>gi|3 1 4 1 6303 |gb|AAP5 1 23 5 . 1 j BGI-PUP7 [SARS coronavirus GZ01] 
35 VPPALHLV 

>gi|3 14 1 6303|gb|AAP5 1 235. 1 j BGI-PUP7 [SARS coronavirus GZ01] 
PPALHLVD 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
PALHLVDP 

40 >gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
ALHLVDPQ 

>gi|3 1 4 1 6303 |gb|AAP5 1 235. 1 [ BGI-PUP7 [SARS coronavirus GZ01] 
LHLVDPQI 

>gi| 3 1 4 1 6303 |gb| A AP5 1235.1) BGI-PUP7 [SARS coronavirus GZ01] 
45 HLVDPQIQ 

>gi |3 1 4 1 6303 |gb| A AP5 1 235 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
LVDPQIQL 
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>gi|31416303|gb|AAP51235.1| 

VDPQIQLT 
>gi|31416303|gb|AAP51235.1| 
DPQIQLTI 
5 >gi|31416303|gb|AAP51235.1| 
PQIQLTIT 
>gi|3 14 1 6303|gb|AAP5 1 235 . 1 1 

QIQLTITR 
>gi|31416303|gb|AAP51235.1| 
10 ~ IQLTITRM 

>gi|3 141 6303|gb|AAP5 1235.11 

QLTITRME 
>gi|3 14 1 6303 |gb|AAP5 1 235 . 1 1 
LTITRMED 
15 >gi|3 1 4 1 6303 |gb| AAP5 1 235 . 1 1 
TITRMEDA 
>gi|31416303|gb|AAP51235.1| 

ITRMEDAM 
>gi |31416303 |gb| AAP5 1 23 5 . 1 1 
20 TRMEDAMG 

>gi|3 1 4 1 6303 |gb|AAP5 1 235 . 1 ! 

RMEDAMGQ 
>gi |3 1 4 1 63 03 |gb| A AP 5 1 2 3 5 . 1 
MEDAMGQG 
25 >gi|3 1 4 1 6303 |gb| AAP5 1 23 5 . 1 
EDAMGQGQ 
>gi|3 14 16303|gb|AAP51 235.1 

DAMGQGQN 
>gi|31416303|gb|AAP51235.1 
30 AMGQGQNS 

>gi|3 14 1 6303 |gb|AAP5 1 235 . 1 

MGQGQNSA 
>gi|3 141 6303|gb|AAP5 1 235. 1 
GQGQNSAD 
35 >gi|31416303|gb|AAP51235.1 
QGQNSADP 
>gi|3 1 4 1 6303 |gb| A AP5 1235.1 

GQNSADPK 
>gi|3 14 1 6303 |gb|AAP5 1 235 . 1 
40 " QNSADPKV 

>gi|31416303|gb|AAP51235.1 

NSADPKVY 
>gi|3 14 1 6303|gb|AAP5 1 235. 1 
SADPKVYP 
45 >gi |3 141 6303|gb|AAP5 1 235. 1 
ADPKVYPI 
>gi|31416303|gb|AAP51235.1 
DPKVYPII 


BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BG1-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
| BGI-PUP7 [SARS coronavirus GZ01] 
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>gi|3 141 6303|gb|AAP5 1235. 1 1 

PKVYPIIL 
>gi|31416303|gb|AAP51235.1| 
KVYPIILR 
5 >gi|31416303|gb|AAP51235.1| 
VYPI1LRL 
>gi|3 14 1 6303|gb|AAP5 1 235. 1 1 

YPIILRLG 
>gi|3 141 6303|gb|AAP5 1 235. 1 1 
10 PIILRLGS 

>gi|3 14 16303|gb|AAP5 1 235. 1 1 

IILRLGSQ 
>gi|3 1 4 1 6303|gb| AAP5 1 23 5. 1 1 
ILRLGSQL 
15 >gi|3 1 4 1 6303 |gb| AAP5 1 23 5. 1 1 
LRLGSQLS 
>gi|3 1 4 1 6303 |gb|AAP5 1 235. 1 1 

RLGSQLSL 
>gi|3 14 1 6303[gb|AAP5 1235. 1 1 
20 LGSQLSLS 

>gi|3 14 1 6303|gb|AAP5 1235. 1 1 

GSQLSLSM 
>gi|3 1 4 1 6303|gb|A AP5 1 235. 1 ! 
SQLSLSMA 
25 >gi|3 14 1 6303|gb|AAP5 1 235. 1 1 
QLSLSMAR 
>gi|3 1 4 1 6303|gb| AAP5 1 235. 1 1 

LSLSMARR 
>gi|3 14 16303|gb| AAP5 1 235. 1 1 
30 SLSMARRN 

>gi|3 1 4 1 6303|gb| AAP5 1 235. 1 1 

LSMARRNL 
>gi|31416303|gb|AAP51235.1| 
SMARRNLD 
35 >gi|31416303|gb|AAP51235.1| 
MARRNLDS 
>gi|3 14 1 6303|gb|AAP5 1 235. 1 1 

ARRNLDSL 
>gi|3 14 16303|gb|AAP5 1235. 1 1 
40 RRNLDSLE 

>gi |3 1 4 1 6303 |gb| AAP5 1 23 5. 1 1 

RNLDSLEA 
>gi|3 14 1 6303|gb|AAP5 1 235. 1 1 
NLDSLEAR 
45 >gi|3 1 4 1 6303|gb| AAP5 1 235. 1 1 
LDSLEARA 
>gi|31416303|gb|AAP51235.1| 
DSLEARAF 


BGI-PUP7 [SARS coronavirus GZ01 ] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
BGI-PUP7 [SARS coronavirus GZ01] 
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>gi|3 141 6303[gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirusGZOl] 
SLEARAFQ 

>gi|3 141 6303|gb|AAP5 1235.1 1 BGI-PUP7 [SARS coronavirusGZOl] 
LEARAFQS 

5 >gi|3 14 16303|gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirusGZOl] 
EARAFQST 

>gi|3 141 6303|gb|AAP5 1235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
ARAFQSTP 

>gi|3 1 4 1 6303|gb| AAP5 1 23 5 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
10 RAFQSTPI 

>gi|3 1 4 1 6303 |gb| A AP5 1235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
AFQSTPIV 

>gi|3 141 6303 |gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavinisGZOl] 
FQSTPIW 

15 >gi|3 141 6303 |gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
QSTPIWQ 

>gi|3 1 4 1 6303 |gb| AAP5 1 23 5 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
STPIVVQM 

>gi|3 1416303|gb|AAP5 1235.1 1 BGI-PUP7 [SARS coronavirus GZ01] 

20 TPIWQMT 

>gi|31416303|gb|AAP51235.1|BGI-PUP7 [SARS coronavirus GZ01] 

PIWQMTK 

>gi|3 141 6303|gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
IVVQMTKL 

25 >gi|3 141 6303 |gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
WQMTKLA 

>gi|3 1 4 1 6303 |gb| A AP5 1 235 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
VQMTKLAT 

>gi|3 1 4 1 6303 |gb| A APS 1 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
30 QMTKLATT 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
MTKLATTE 

>gi|3 1 4 1 63 03 |gb| A AP5 1 23 5 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
TKLATTEE 

35 >gi|3 1 4 1 6303 |gb| AAP5 1 235.1 1 BGI-PUP7 [SARS coronavirus GZ01] 
KLATTEEL 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
LATTEELP 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
40 ATTEELPD 

>gi|3 1 4 1 63 03 |gb| A AP5 1235.1 [ BGI-PUP7 [SARS coronavirus GZ01] 
TTEELPDE 

>gi|3 1 4 1 6303|gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
TEELPDEF 

45 >gi|3 14 1 6303|gb|AAP5 1 235. 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
EELPDEFV 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
ELPDEFW 
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>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
LPDEFVW 

>gi |3 1416303 |gb| AAP5 1 23 5 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
PDEFVWT 

5 >gi 1 3 1 4 1 6303 |gb| A AP5 1 23 5 . 1 1 BGI-PUP7 [SARS coronavirus GZ01] 
DEFVVVTA 

>gi|31416303|gb|AAP51235.1| BGI-PUP7 [SARS coronavirus GZ01] 
EFWVTAK 

>gi|3 14 1 6304|gb| AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

10 ISLCSCIR . \ M _ 

>gi|3 141 6304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

SLCSCIRT 

>gi|3 1 4 1 6304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
LCSCIRTV 

15 >gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
CSCIRTVV 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
SCIRTVVQ 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

20 CIRTWQR 

>gi|31416304|gb|AAP5 1236.1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

IRTWQRC 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
RTVVQRCA 

25 >gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
HVLEDPCP 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-lns) [SARS coronavirus GZ01] 
VLEDPCPT 

>gi|3 1 4 1 6304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
30 LEDPCPTG 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-lns) [SARS coronavirus GZ01] 

EDPCPTGY 

>gi|3 1 4 1 6304|gb|AAP5 1236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
DPCPTGYQ 

35 >gi|3 1 4 1 6304|gb|A AP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ0 1 ] 
PCPTGYQP 

>gi|3 141 6304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
CPTGYQPE 

>gi|31416304|gb|AAP51236.1| BGl-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
40 PTGYQPEW 

>gi|3 1 4 1 6304|gb| AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
TGYQPEWN 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
GYQPEWNI 

45 >gi|31416304|gb|A AP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
YQPEWNIR 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
QPEWNIRY 
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>gi[3 141 6304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
PEWNIRYN 

>gi|3 1 4 1 6304jgb|AAP5 1 23 6. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
EWNIRYNT . 
5 >gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

WNIRYNTR . 
>gi|3 1 41 6304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
NIRYNTRG 

>gi|3 1 416304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

10 IRYNTRGN . 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

TAAFRDVF ^^ A11 
>gi|3 1 4 1 6304|gb| A AP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

AAFRDVFV 

15 >gi|3 1 4 1 6304|gb| AAP5 1 23 6. 1 1 BGl-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 
AFRDVFW . 
>gi|3 14 1 6304|gb|AAP5 1 236. 1 1 BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

FRDVFWL . 
>gi|3 141 6304|gb|AAP51 236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

20 RDVFWLN ^ nii 

>gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

DVFVVLNK 

>gi|3l416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

VFWLNKR . 
25 >gi|31416304|gb|AAP51236.1| BGI-PUP(GZ29-nt-Ins) [SARS coronavirus GZ01] 

FWLNKRT 

>gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
MKIILFLT 

>gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 

30 KIILFLTL 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] IILFLTLI 
>gi|3 1 58 1 5 1 1 |gb| AAP33 703 . 1 1 OrHa [SARS coronavirus Frankfurt 1] 
ILFLTLIV 

>gi|3 1 5 8 1 5 1 1 |gb| A AP3 3 703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
35 LFLTLIVF 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 

FLTLIVFT 

>gi|3 1 5 8 1 5 1 1 |gb| A AP33 703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
LTLIVFTS 

40 >gi|3158151 l|gb|AAP33703.1 1 Orf7a [SARS coronavirus Frankfurt 1] 
TLIVFTSC 

>gi|3 1 58 1 5 1 1 |gb| AAP33 703 . 1 1 OrHa [SARS coronavirus Frankfurt 1] 
LIVFTSCE 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 

45 IVFTSCEL 

>gi|3 1581511 |gb| AAP33703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
VFTSCELY 
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>gi|3 1 58 1 5 1 1 |gb|AAP33703. 1 1 

FTSCELYH 
>gi|3 1581511 |gb| AAP33703. 1 1 

TSCELYHY 
5 >gi|31581511|gb|AAP33703.1| 

SCELYHYQ 
>gi|31581511|gb|AAP33703.1| 

CELYHYQE 
>gi|3 1581511 |gb|AAP33703.1 1 
10 ELYHYQEC 

>gi|3 1581511 |gb| AAP33703. 1 1 

LYHYQECV 
>gi|3 1 5 8 1 5 1 1 |gb| A AP3 3703. 1 1 

YHYQECVR 
15 >gi|3 1581511 |gb[AAP33703. 1 1 

HYQECVRG 
>gi|3 1 5 8 1 5 1 1 |gb|AAP33703. 1 1 

YQECVRGT 
>gi|3 1 58 1 5 1 1 |gb|AAP33703. 1 ! 
20 QECVRGTT 

>gi|3 1581511 |gb|AAP33703. 1 

ECVRGTTV 
>gi|3 1581511 |gb|AAP33703 . 1 

CVRGTTVL 
25 >gi|3 1581511 |gb|AAP33703. 1 

VRGTTVLL 
>gi|3 1581511 |gb|AAP33703. 1 

RGTTVLLK 
>gi|3 1581511 jgb|AAP33703. 1 
30 GTTVLLKE 

>gi|3 1581511 |gb| AAP33 703 . 1 

TTVLLKEP 
>gi|3 1581511 |gb|AAP33703. 1 

TVLLKEPC 
35 >gi|3 1581511 |gb|AAP33703. 1 

VLLKEPCP 
>gi|3 1581511 |gb|AAP33703. 1 

LLKEPCPS 
>gi|3 1581511 |gb|AAP33703. 1 
40 LKEPCPSG 

>gi|31581511|gb|AAP33703.1 

KEPCPSGT 
>gi|3 1581511 jgb| AAP33703. 1 

EPCPSGTY 
45 >gi|31581511|gb|AAP33703.1 

PCPSGTYE 
>gi|3 158 1 5 1 1 |gb|AAP33703. 1 

CPSGTYEG 


Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt Y 
Orf7a [SARS coronavirus Frankfurt Y 
Orf7a [SARS coronavirus Frankfurt Y 
Orf7a [SARS coronavirus Frankfurt Y 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
j Orf7a [SARS coronavirus Frankfurt 1 
j Orf7a [SARS coronavirus Frankfurt 1 
| Orf7a [SARS coronavirus Frankfurt 1 
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>gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
PSGTYEGN 

>gi|3 1581511 tgb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
SGTYEGNS 

5 >gi|3 1 58 1 5 1 1 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
GTYEGNSP 

>gi|3 1 58 1 5 1 1 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
TYEGNSPF 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
10 YEGNSPFH 

>gi|31581511|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 

EGNSPFHP 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
GNSPFHPL 

15 >gi|3158151 l|gb|AAP33703.1| OrfZa [SARS coronavirus Frankfurt 1] 
NSPFHPLA 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
SPFHPLAD 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
20 PFHPLADN 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
FHPLADNK 

>gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
HPLADNKF 

25 >gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
PLADNKFA 

>gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
LADNKFAL 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
30 ADNKFALT 

>gi|3 1 58 1 5 1 1 |gb| A AP3 3 703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
DNKFALTC 

>gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
NKFALTCT 

35 >gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
KFALTCTS 

>gi|31 581 51 l|gb| AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
FALTCTST 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
40 ALTCTSTH 

>gi[3 1581511 |gb| AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
LTCTSTHF 

>gi|3 1 58 1 5 1 1 |gb| AAP33703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
TCTSTHFA 

45 >gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
CTSTHFAF 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
TSTHFAFA 
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>gi|3 158151 l|gb|AAP33703.1| 

STHFAFAC 
>gi|3 1581511 |gb| AAP3 3 703 . 1 1 
THFAFACA 
5 >gi|31581511|gb|AAP33703.1| 
HFAFACAD 
>gi|31581511|gb|AAP33703.1| 

FAFACADG 
>gi|3 1581511 |gb|AAP33703 . 1 1 
10 AFACADGT 

>gi|31581511|gb|AAP33703.1| 

FACADGTR 
>gi|31581511|gb|AAP33703.1| 
ACADGTRH 
15 >gi|31581511|gb|AAP33703.1| 
CADGTRHT 
>gi|3 1581511 |gb|AAP33703. 1 1 

ADGTRHTY 
>gi|3 1581511 |gb|AAP33703. 1 1 
20 DGTRHTYQ 

>gi|3 1581511 |gb|AAP33703. 1 [ 

GTRHTYQL 
>gi|31581511|gb|AAP33703.1| 
TRHTYQLR 
25 >gi|3 158151 l|gb|AAP33703.1| 
RHTYQLRA 
>gi|3 1581511 |gb|AAP33703. 1 1 

HTYQLRAR 
>gi|3 1 581 5 1 1 |gb|AAP33703. 1 1 
30 TYQLRARS 

>gi|3 1581511 |gb|AAP33703. 1 1 

YQLRARSV 
>gi|31581511|gb|AAP33703.1| 
QLRARSVS 
35 >gi|3 1581511 |gb|AAP33703. 1 1 
LRARSVSP 
>gi|3 1581511 |gb| AAP33703 . 1 1 

RARSVSPK 
>gi|31581511|gb|AAP33703.1| 
40 ARSVSPKL 

>gi|31 58151 l|gb|AAP33703.1| 

RSVSPKLF 
>gi|31581511|gb|A AP33703.1| 
SVSPKLFI 
45 >gi|31581511|gb|AAP33703.1| 
VSPKLFIR 
>gi|3 1581511 |gb|AAP33703. 1 1 
SPKLFIRQ 


Orf7a [SARS coronayirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
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>gi|3 1581511 |gb|AAP33703.1 1 

PKLFIRQE 
>gi|3 1 58 1 5 1 1 |gb| A AP33 703 . 1 1 

KLFIRQEE 
5 >gi|31581511|gb|AAP33703.1| 

LFIRQEEV 
>gi|31581511|gb|AAP33703.1| 

FIRQEEVQ 
>gi|3 1581511 |gb|A AP33 703 . 1 1 
10 IRQEEVQQ 

>gi|31581511|gb|AAP33703.1| 

RQEEVQQE 
>gi|3158151 l|gb|AAP33703.1| 

QEEVQQEL 
15 >gi|31581511|gb|AAP33703.1| 

EEVQQELY 
>gi|3 1 5 8 1 5 1 1 |gb| A AP3 3 703 . 1 1 

EVQQELYS 
>gi|3 1 58 1 5 1 1 |gb| AAP33703 . 1 ] 
20 VQQELYSP 

>gi|3 1 58 1 5 1 1 |gb|AAP33703 . 1 j 

QQELYSPL 
>gi|3 15815 1 l|gb|AAP33703.1! 

QELYSPLF 
25 >gi|3 1581511 |gb|A AP33 703 . 1 

ELYSPLFL 
>gi|3158151 l|gb|AAP33703.1 

LYSPLFLI 
>gi|3 158151 l|gb|AAP33703.1 
30 YSPLFLIV 

>gi|31581511|gb|AAP33703.1 

SPLFLIVA 
>gi|3 1581511 |gb| AAP33703. 1 

PLFLIVAA 
35 >gi|31581511|gb|AAP33703.1 

LFLIVAAL 
>gi|31581511|gb|AAP33703.1 

FLIVAALV 
>gi|31581511|gb|AAP33703.1 
40 LIVAALVF 

>gi|31581511|gb|AAP33703.1 

IVAALVFL 
>gi|3 1581511 |gb|AAP33703. 1 

VAALVFLI 
45 >gi|3 1581511 |gb| A AP3 3703 . 1 

AALVFLIL 
>gi|3158151 l|gb|AAP33703.1 

ALVFLILC 


Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 1] 
Orf7a [SARS coronavirus Frankfurt 11 
Orf7a [SARS coronavirus Frankfurt I 
Orf7a [SARS coronavirus Frankfurt V 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
Orf7a [SARS coronavirus Frankfurt 1 
| Orf7a [SARS coronavirus Frankfurt 1 
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>gi|3 1 5 8 1 5 1 1 jgb| A AP3 3 703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
LVFLILCF 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
VFLILCFT 

5 >gi|3 1581511 |gb|AAP3 3 703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
FLILCFTI 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 
LILCFTIK 

>gi|3 1581511 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
10 ILCFTIKR 

>gi|3158151 l|gb|AAP33703.1| Orf7a [SARS coronavirus Frankfurt 1] 

LCFTIKRK 

>gi|3 1 58151 1 |gb|AAP33703. 1 1 Orf7a [SARS coronavirus Frankfurt 1 ] 
CFTIKRKT 

15 >gi|3 1581511 |gb| A AP3 3 703 . 1 1 Orf7a [SARS coronavirus Frankfurt 1] 
FTIKRKTE 

>gi|30026017|gb|AAP04587.1| RNA-directed RNA polymerase [SARS coronavirus 
Taiwan] ILSDDGVX 

>gi|30026017|gb|AAP04587.1 1 RNA-directed RNA polymerase [SARS coronavirus 

20 Taiwan] LSDDGVXV 

>gi|30026017|gb|AAP04587.1| RNA-directed RNA polymerase [SARS coronavirus 
Taiwan] SDDGVXVL 

>gi|30026017|gb|AAP04587.1| RNA-directed RNA polymerase [SARS coronavirus 
Taiwan] DDGVXVLN 
25 >gi|3027567 1 |gb|AAP30032. 1 1 putative uncharacterized protein 2 [SARS 
coronavirus BJ01] LLIQQW1P 

>gi|30275671|gb|AAP30032.1| putative uncharacterized protein 2 [SARS 
coronavirus BJ01] LIQQWIPF 

>gi|30275671|gb|AAP30032.1| putative uncharacterized protein 2 [SARS 
30 coronavirus BJ01] IQQWIPFM 

>gi|3027567 1 |gb| AAP30032. 1 1 putative uncharacterized protein 2 [SARS 
coronavirus BJ01] QQWIPFMM 

>gi|3027567 1 |gb| A AP30032. 1 1 putative uncharacterized protein 2 [SARS 
coronavirus BJ01] QWIPFMMS 
35 >gi|3027567 1 |gb|AAP30032. 1 1 putative uncharacterized protein 2 [SARS 
coronavirus BJ01] WIPFMMSR 

>gi|30275671|gb|AAP30032.1| putative uncharacterized protein 2 [SARS 
coronavirus BJ01] IPFMMSRR 

>gi|30275671|gb|AAP30032.1| putative uncharacterized protein 2 [SARS 
40 coronavirus BJ01] PFMMSRRR 

>gi|3 1 4 1 6297|gb| AAP5 1229.1 1 BGI-PUP2 [SARS coronavirus GZ01] 
QIQLSLLQ 

>gi|3 1 4 1 6297|gb| A APS 1 229. 1 1 BGI-PUP2 [SARS coronavirus GZ01] 
IQLSLLQV 

45 >gi|3 1 4 1 6297|gb| AAP5 1 229. 1 1 BGI-PUP2 [SARS coronavirus GZ01] 
QLSLLQVT 

>gi|3 141 6297|gb|AAP5 1 229. 1 1 BGI-PUP2 [SARS coronavirus GZ01] 
LSLLQVTA 
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>gi|3 141 6297|gb|AAP5 1229.1 1 BGI-PUP2 [SARS coronavirus GZ01] 
SLLQVTAF 

>gi|3 141 6297|gb|AAP5 1229.1 1 BGI-PUP2 [SARS coronavirus GZ01] 
LLQVTAFQ 

5 >gi|31416297|gb|AAP51229.1| BGI-PUP2 [SARS coronavirus GZ01] 
LQVTAFQH 

>gi|31416297|gb|AAP51229.1| BGI-PUP2 [SARS coronavirus GZ01] 
QVTAFQHQ 

>gi|31416297|gb|AAP51229.1| BGI-PUP2 [SARS coronavirus GZ01] 
10 STALQELQ 

>gi|31416297|gb|AAP51229.1| BGI-PUP2 [SARS coronavirus GZ01] 
TALQELQI 

>gi|3 1 4 1 6297|gb|AAP5 1 229. 1 1 BGI-PUP2 [SARS coronavirus GZ01] 
ALQELQIQ 

15 >gi|31416297|gb|AAP51229.1| BGI-PUP2 [SARS coronavirus GZ01] 
LQELQIQQ 

>gi]3 14 1 6297|gbjAAP5 1 229. 1 1 BGI-PUP2 [SARS coronavirus GZ01] 
QELQIQQW 

>gi|3 1 4 1 6297|gb|A AP5 1 229. 1 1 BGI-PUP2 [SARS coronavirus GZ01] 
20 ELQIQQWI 

>gi|31416297|gb|AAP51229.1| BGI-PUP2 [SARS coronavirus GZ01] 
LQIQQWIQ 

>gi|31416297|gb|AAP51229.1| BGI-PUP2 [SARS coronavirus GZ01] 
QIQQWIQF 

25 >gi |30795 1 47|gb|AAP4 1 039. 1 1 Orf4 [SARS coronavirus Tor2] LL1QQWIQ 
>gi|30795147|gb|AAP41039.1| Orf4 [SARS coronavirus Tor2] LIQQWIQF 
>gi|30314342|gb|AAP06763.1| RNA-directed RNA polymerase [SARS coronavirus 
Hong Kong/03/2003] QDAVASKI 

>gi|30314342|gb|AAP06763.1| RNA-directed RNA polymerase [SARS coronavirus 
30 Hong Kong/03/2003] DAVASKIL 

>gi|30314342|gb|AAP06763.1| RNA-directed RNA polymerase [SARS coronavirus 
Hong Kong/03/2003] YVDTENNL 

>gi|31581509|gb|AAP33701.1| membrane protein M [SARS coronavirus Frankfurt 
1] LACFVLAV 

35 >gi|3 1581 509|gb|AAP3370 1 . 1 1 membrane protein M [SARS coronavirus Frankfurt 
1] ACFVLAVV 

>gi|31581509|gb|AAP33701.1| membrane protein M [SARS coronavirus Frankfurt 
1] CFVLAWY 

>gi|31581509|gb|AAP33701.1| membrane protein M [SARS coronavirus Frankfurt 

40 1] FVLAVVYR 

>gi|31581509|gb|AAP33701.1| membrane protein M [SARS coronavirus Frankfurt 

1] VLAWYRI 

>gi|31581509|gb|AAP33701.1| membrane protein M [SARS coronavirus Frankfurt 
1] LAVVYRIN 

45 >gi|31581509|gb|AAP33701.1| membrane protein M [SARS coronavirus Frankfurt 
1] AWYRINW 

>gi|31581509|gb|AAP33701.1| membrane protein M [SARS coronavirus Frankfurt 
1] WYRINWV 
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>gi|30027623|gb|AAP 13444.1 1 M protein [SARS coronavirus Urbani] 
HLRMAGHP 

>gi|30027623|gb|AAP13444.1| M protein [SARS coronavirus Urbani] 
LRMAGHPL 

5 >gi|30027623|gb|AAP 1 3444. 1 1 M protein [SARS coronavirus Urbani] 
RMAGHPLG 

>gi|30027623|gb|AAP13444.1| M protein [SARS coronavirus Urbani] 
MAGHPLGR 

>gi|30027623|gb|AAPl 3444.1 1 M protein [SARS coronavirus Urbani] 

10 AGHPLGRC 

>gi|30027623|gb|AAPl 3444.1 1 M protein [SARS coronavirus Urbani] 

GHPLGRCD 

>gi|30027623|gb|AAP13444.1| M protein [SARS coronavirus Urbani] 
HPLGRCDI 

1 5 >gi|30027623 |gb|AAPl 3444 . 1 1 M protein [SARS coronavirus Urbani] 
PLGRCD1K 

>gi|30275670|gb|AAP30031.1| putative uncharacterized protein 1 [SARS 
coronavirus BJ01] LCWKCKSQ 

>gi|30275670|gb|AAP3003 1.1 1 putative uncharacterized protein 1 [SARS 

20 coronavirus B JO 1] CWKCKSQN 

>gi|30275670|gb|AAP30031.1| putative uncharacterized protein 1 [SARS 

coronavirus BJ01] WKCKSQNP 

>gi|30275670|gb|AAP3003 1 . 1 1 putative uncharacterized protein 1 [SARS 
coronavirus BJ01] KCKSQNPL 
25 >gi|30275670|gb|AAP3003 1 . 1 1 putative uncharacterized protein 1 [SARS 
coronavirus BJ01] CKSQNPLL 

>gi|30275670|gb|AAP3003 1 . 1 1 putative uncharacterized protein 1 [SARS 
coronavirus BJ01] KSQNPLLY 

>gi|30275670|gb|AAP30031.1| putative uncharacterized protein 1 [SARS 
30 coronavirus BJ01] SQNPLLYD 

>gi|30275670|gb|AAP30031.1| putative uncharacterized protein 1 [SARS 
coronavirus BJ01] QNPLLYDA 

>gi|31416296|gb|AAP51228.1|BGI-PUPl [SARS coronavirus GZ01] 
TDTIWTA 

35 >gi|31416296|gb|AAP51228.1| BGI-PUP1 [SARS coronavirus GZ01] 
DTIWTAG 

>gi|31416296|gb|AAP51228.1| BGI-PUP1 [SARS coronavirus GZ01] 
TIVVTAGD 

>gi|31416296|gb|AAP51228.1| BGI-PUP1 [SARS coronavirus GZ01] 
40 IWTAGDG 

>gi|3 1 4 1 6296|gb|AAP5 1 228. 1 1 BGI-PUP1 [SARS coronavirus GZ01] 
WTAGDGI 

>gi|3 141 6296|gb|AAP5 1228.1 1 BGI-PUP1 [SARS coronavirus GZ01] 
VTAGDGIS 

45 >gi|3 1 4 1 6296|gb|AAP5 1 228. 1 1 BGI-PUP 1 [SARS coronavirus GZ0 1 ] 
TAGDGIST 

>gi|3 1 4 1 6296|gb|AAP5 1 228. 1 1 BGI-PUP 1 [SARS coronavirus GZ0 1 ] 
AGDGISTP 
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>gi|31416296|gb|AAP51228.1|BGI-PUPl [SARS coronavirus GZ01] 
IGGYSEDW 

>gi|3 1 4 1 6296|gb|A AP5 1 228. 1 1 BGI-PUP1 [SARS coronavirus GZ01] 
GGYSEDWH 

5 >gi|3 1 4 1 6296|gb|AAP5 1 228. 1 1 BGI-PUP1 [SARS coronavirus GZ01] 
GYSEDWHS 

>gi|3 141 6296|gb|AAP5 1228.1 1 BGI-PUP1 [SARS coronavirus GZ01] 
YSEDWHSG 

>gi|3 14 1 6296|gb|A AP5 1 228. 1 1 BGI-PUP1 [SARS coronavirus GZ01] 
10 SEDWHSGV 

>gi|3 1 4 1 6296jgb| AAP5 1 228 . 1 1 BGI-PUP1 [SARS coronavirus GZ01] 
EDWHSGVK 

>gi|3 1 4 1 6296|gb|AAP5 1 228. 1 1 BGI-PUP1 [SARS coronavirus GZ01 ] 
DWHSGVKD 

15 >gi|3 14 1 6296|gb|AAP5 1 228. 1 1 BGI-PUP1 [SARS coronavirus GZ01] 
WHSGVKDY 

>gi|30795146|gb|AAP41038.1| Orf3 [SARS coronavirus Tor2] FMRFFTLR 
>gi|30795 1 46|gb|AAP4 1 038. 1 1 Orf3 [SARS coronavirus Tor2] MRFFTLRS 
>gi|30795 1 46|gb|AAP4 1 038. 1 1 Or£3 [SARS coronavirus Tor2] RFFTLRSI 

20 >gi|30795146|gb|AAP41038.1|Orf3 [SARS coronavirus Tor2] FFTLRSIT 
>gi|30795 1 46|gb|AAP4 1038.1 1 Orf3 [SARS coronavirus Tor2] FTLRSITA 
>gi|30795 1 46|gb|AAP41 038. 1 1 OrD [SARS coronavirus Tor2) TLRSITAQ 
>gi|30795 1 46|gb|AAP4 1038.1 1 OrD [SARS coronavirus Tor2] LRSITAQP 
>gi|30795 1 46|gb|A AP4 1 038. 1 j Orf3 [SARS coronavirus Tor2] RSITAQPV 

25 >gi|30421455|gb|AAP30714.1| putative nucleocapsid protein [SARS coronavirus 
CUHK-SulO] RSSSRSRC 

>gi|30421455|gb|AAP30714.1 1 putative nucleocapsid protein [SARS coronavirus 
CUHK-SulO] SSSRSRCN 

>gi|3042 1 455|gb|AAP307 14. 1 1 putative nucleocapsid protein [SARS coronavirus 
30 CUHK-SulO] SSRSRCNS 

>gi|30421455|gb|AAP30714.1| putative nucleocapsid protein [SARS coronavirus 

CUHK-SulO] SRSRCNSR 

>gi|30421455|gb|AAP30714.1| putative nucleocapsid protein [SARS coronavirus 

CUHK-SulO] RSRCNSRN 
35 >gi|30421455|gb|AAP30714.1| putative nucleocapsid protein [SARS coronavirus 

CUHK-SulO] SRCNSRNS 

>gi|30421455|gb|AAP30714.1 1 putative nucleocapsid protein [SARS coronavirus 
CUHK-SulO] RCNSRNST 

>gi|30421455|gb|AAP30714.1 1 putative nucleocapsid protein [SARS coronavirus 
40 CUHK-SulO] CNSRNSTP 

>gi|31540949|gb|AAP49024.1| nucleocapsid protein [SARS coronavirus] 
PQGLPNNI 

>gi|31540949|gb|AAP49024.1| nucleocapsid protein [SARS coronavirus] 
QGLPNNIA 

45 >gi|3 1 540949|gb|AAP49024. 1 1 nucleocapsid protein [SARS coronavinis] 
GLPNNIAS 

>gi|31540949|gb|AAP49024.1 1 nucleocapsid protein [SARS coronavirus] 
LPNNIASW 
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>gi|3 1 540949|gb|AAP49024. 1 1 nucleocapsid protein [SARS coronavirus] 
PNNIASWF 

>gi|3 1 540949|gb|AAP49024. 1 1 nucleocapsid protein [SARS coronavirus] 
KNIASWFT 

5 >gi|3 1 540949|gb|AAP49024. 1 1 nucleocapsid protein [SARS coronavirus] 
NIASWFTA 

>gi|3 1 540949|gb|AAP49024. 1 1 nucleocapsid protein [SARS coronavirus] 
IASWFTAL 

>gi|3 1581 505|gb|AAP33697. 1 1 spike protein S [SARS coronavirus Frankfurt 1] 

10 HTSPDVDF ■ 

>gi|31581505|gb|AAP33697.1| spike protein S [SARS coronavirus Frankfurt 1] 

TSPDVDFG 

>gi|31581505|gb|AAP33697.1| spike protein S [SARS coronavirus Frankfurt 1] 
SPDVDFGD 

1 5 >gi|3 1 58 1 505|gb| AAP33697. 1 1 spike protein S [SARS coronavirus Frankfurt 1 ] 
PDVDFGDI 

>gi|3 1581 505|gb|AAP33697. 1 1 spike protein S [SARS coronavirus Frankfurt 1 ] 
DVDFGDIS 

>gi|31581505|gb|AAP33697.1| spike protein S [SARS coronavirus Frankfurt 1] 

20 VDFGDISG 

>gi|31581505|gb|AAP33697.1| spike protein S [SARS coronavirus Frankfurt 1] 

DFGDISGI 

>gi|3 1581 505|gb|AAP33697. 1 1 spike protein S [SARS coronavirus Frankfurt 1 ] 
FGDISGIN 

25 >gi|3 141 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
RAILTAFL 

>gi|3 141 6295|gb|AAP5 1227.1 1 spike glycoprotein S [SARS coronavirus GZ01] 
AILTAFLP 

>gi|3 1 4 1 6295 |gbj A AP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
30 ILTAFLPA 

>gi|3 1 4 1 6295|gb|A AP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
LTAFLPAQ 

>gi|3 14 1 6295|gbj AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
TAFLPAQD 

35 >gi|31416295|gb|AAP51227.1 1 spike glycoprotein S [SARS coronavirus GZ01] 
AFLPAQDT 

>gi|3 141 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
FLPAQDTW 

>gi|3 141 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
40 LPAQDTWG 

>gi|31416295|gb|AAP51227.1| spike glycoprotein S [SARS coronavirus GZ01] 

NFRWPSR 

>gi|3 1 4 1 6295 |gb| A AP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
FRWPSRD 

45 >gi|3 141 6295|gb|AAP5 1227.1 1 spike glycoprotein S [SARS coronavirus GZ01] 
RWPSRDV 

>gi|314l6295|gb|AAP5 1227.1 1 spike glycoprotein S [SARS coronavirus GZ01] 
VVPSRDVV 
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>gi|3 14 1 6295|gb|AAP5 1 227. 1 1 spike 

VPSRDVVR 
>gi|3 14 1 6295|gb|AAP5 1 227. 1 1 spike 
PSRDVVRF 
5 >gi(3 1 4 1 6295|gb| A AP5 1 227. 1 1 spike 
SRDWRFP 
>gi|3 1 4 1 6295|gb| A AP5 1 227. 1 1 spike 

RDVVRFPN 
>gi|3 1 4 1 6295|gb|AAP5 1 227. 1 1 spike 
10 VYAWERKR 

>gi|3 1 4 1 6295|gb|AAP5 1227.1 1 spike 

YAWERKRI 
>gi|3 141 6295|gb|AAP5 1227. 1 1 spike 
AWERKRIS 
15 >gi|3 141 6295|gb|A AP5 1 227. 1 1 spike 
WERKRISN 
>gi|3 1 41 6295|gb|AAP5 1227. 1 1 spike 

ERKR1SNC 
>gi|3 1 4 1 6295|gb|AAP5 1 227. 1 1 spike 
20 RKRISNCV 

>gi|3 1 4 1 6295 |gb| A AP5 1 227. 1 1 spike 

KRISNCVA 
>gi|3 141 6295|gb|AAP5 1227.1 1 spike 
RISNCVAD 
25 >gi|3 1 4 1 6295|gb| AAP5 1 227. 1 1 spike 
YRVWLSY 
>gi|3 1 4 1 6295|gb|AAP5 1 227. 1 1 spike 

RVVVLSYE 
>gi|3 1 416295|gb|AAP5 1 227. 1 1 spike 
30 ~ VWLSYEL 

>gi|3 1 4 1 6295|gb| AAP5 1 227. 1 1 spike 

WLSYELL 
>gi|3 1 41 6295|gb|AAP5 1 227. 1 1 spike 
VLSYELLN 
35 >gi|31416295|gb|AAP51227.1| spike 
LSYELLNA 
>gi|3 1 4 1 6295|gb|A AP5 1 227. 1 1 spike 

SYELLNAP 
>gi|3 1 4 1 6295|gb|A AP5 1 227. 1 1 spike 
40 YELLNAPA 

>gi|3 1 41 6295|gb|AAP5 1 227. 1 1 spike 

YKTPTLKD 
>gi|3 1 4 1 6295|gb|AAP5 1 227. 1 1 spike 
KTPTLKDF 
45 >gi|3 1 4 1 6295|gb| A AP5 1 227. 1 1 spike 
TPTLKDFG 
>gi|3 1 41 6295|gb|AAP5 1 227. 1 1 spike 
PTLKDFGG 


glycoprotein S [SARS corona virus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
glycoprotein S [SARS coronavirus GZ01] 
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>gi|3 1 4 1 6295|gb| AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
TLKDFGGF 

>gi|3 1 4 1 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
LKDFGGFN 

5 >gi|3 141 6295|gb|A AP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
KDFGGFNF 

>gi|3 141 6295|gb|AAP5 1 227. 1 j spike glycoprotein S [SARS coronavirus GZ01] 
DFGGFNFS 

>gi|3 14 1 6295|gb|AAP5 1227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
10 ILPDPLKS 

>gij3 141 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
LPDPLKST 

>gi|3 14 1 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
PDPLKSTK 

15 >gi|3 14 1 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
DPLKSTKR 

>gi|3 14 1 6295|gb|AAP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ01] 
PLKSTKRS 

>gi|31416295|gb|AAP51227.1| spike glycoprotein S [SARS coronavirus GZ01] 
20 LKSTKRSF 

>gi|3 14 16295|gb|AAP5 1227.1 1 spike glycoprotein S [SARS coronavirus GZ01] 

KSTKRSFI 

>gi|3 1 4 1 6295|gb| A AP5 1 227. 1 1 spike glycoprotein S [SARS coronavirus GZ0 1 ] 
STKRSFIE 

25 >gi|30795 1 45|gb|A AP4 1037. 1 1 spike glycoprotein [SARS coronavirus Tor2] 
ILDISPCA 

>gi|30795145|gb|AAP4 1037.1 1 spike glycoprotein [SARS coronavirus Tor2] 
LDISPCAF 

>gi|30795 145|gbjAAP4 1 037. 1 1 spike glycoprotein [SARS coronavirus Tor2] 

30 DISPCAFG 

>gi|30795145|gb|AAP41 037.1 1 spike glycoprotein [SARS coronavirus Tor2] 

ISPCAFGG 

>gi|30795145|gb|AAP41 037.1 1 spike glycoprotein [SARS coronavirus Tor2] 
SPCAFGGV 

35 >gi|30795145|gb|AAP4 1037.1 1 spike glycoprotein [SARS coronavirus Tor2] 
PCAFGGVS 

>gi|30795145|gb|AAP4 1037.1 1 spike glycoprotein [SARS coronavirus Tor2] 
CAFGGVSV 

>gi|30795 145|gb| AAP4 1 037. 1 1 spike glycoprotein [SARS coronavirus Tor2] 
40 AFGGVSVI 

>gi|30023954|gb|AAPl 3567.1 1 putative E2 glycoprotein precursor [SARS 
coronavirus CUHK-W1] AFSPAQDT 

>gi|30023954|gb|AAPl 3567.1 1 putative E2 glycoprotein precursor [SARS 
coronavimsCUHK-Wl] FSPAQDTW 
45 >gi|30023954|gb|AAPl 3567. 1 1 putative E2 glycoprotein precursor [SARS 
coronavirus CUHK-W 1 ] SPAQDTWG 

>gi|3 14 16293|gb|AAP5 1225.1 1 orflab [SARS coronavirus GZ01] DALCEKAS 
>gi|3 1 4 1 6293|gb|AAP5 1 225. 1 1 orflab [SARS coronavirus GZ01] ALCEKASK 


-177- 


WO 2005/078453 


PCTAJS2005/003634 


>gi|31416293|gb|AAP51225.1| orflab [SARS coronavirus GZ01 
>gi|3 14 1 6293(gb| AAP5 1 225 . 1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293 |gb|A AP5 1 225 . 1 1 orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293 |gb|A AP5 1 225 . 1 j orflab [SARS coronavirus GZ01 
5 >gi|31416293|gb|AAP51225.1| orflab [SARS coronavirus GZ01 
>gi|3 141 6293|gb|AAP5 1 225. 1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293[gb| A AP5 1 225 . 1 j orflab [SARS coronavirus GZ01 
>gi|3 141 6293)gb| A AP5 1 225 . 1 j orflab [SARS coronavirus GZ01 
>gi|31416293|gb|AAP51225.1| orflab [SARS coronavirus GZ01 

10 >gi|3 1 4 1 6293 |gb| AAP5 1 225 . 1 1 orfl ab [SARS coronavirus GZO 1 
>gi|3 1 41 6293|gb|AAP5 1 225 . 1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293 |gb|AAP5 1 225 . 1 1 orflab [SARS coronavirus GZ01 
>gi|3 141 6293|gb|AAP5 1225.1 1 orflab [SARS coronavirus GZ01 
>gi|3 141 6293|gb|AAP5 1225.1 j orflab [SARS coronavirus GZ01 

15 >gi|3 1 4 1 6293|gb|AAP5 1 225 . 1 1 orflab [SARS coronavirus GZ01 
>gi|3 141 6293|gb|AAP5 1 225. 1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293|gb| A AP5 1 225 . 1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293 |gb| AAP5 1 225 . 1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293 |gb| AAP5 1 225 . 1 j orflab [SARS coronavirus GZ01 

20 >gi|3 141 6293|gb|AAP5 1225.1 1 orflab [SARS coronavirus GZ01 
>gi|3 141 6293|gb|AAP5 1225.1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293|gb| A AP5 1 225 . 1 1 orflab [SARS coronavirus GZ01 
>gi|3 14 16293|gb|AAP5 1225.1 j orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293 |gb|A AP5 1 225. 1 j orflab [SARS coronavirus GZ01 

25 >gi|31416293|gb|AAP51225.1| orflab [SARS coronavirus GZ01 
>gi|3 141 6293|gb|AAP5 1225.11 orflab [SARS coronavirus GZ01 
>gi|3 1 4 1 6293 |gb|AAP5 1 225. 1 1 orflab [SARS coronavirus GZ01 M 
>gi|30795 144|gb| AAP4 1 036. 1 1 replicase 1AB [SARS coronavirus Tor2] 
SADASTFF 

30 >gi|30795 1 44|gb|AAP4 1 036. 1 1 replicase 1 AB [SARS coronavirus Tor2] 
ADASTFFK 

>gi|30795144|gb|AAP41036.1 1 replicase 1AB [SARS coronavirus Tor2] 
DASTFFKR 

>gi|30795144|gb|AAP4 1036.1 1 replicase 1AB [SARS coronavirus Tor2] 

35 ASTFFKRV 

>gi|30795144|gb|AAP4 1036.1 1 replicase 1AB [SARS coronavirus Tor2] 

STFFKRVC 

>gi|30795144|gb|AAP4 1036.1 1 replicase 1AB [SARS coronavirus Tor2] 
TFFKRVCG 

40 >gi|30795 1 44|gb|AAP4 1 036. 1 1 replicase 1 AB [SARS coronavirus Tor2] 
FFKRVCGV 

>gi|30795144|gb|AAP4 1036.1 1 replicase 1AB [SARS coronavirus Tor2] 
FKRVCGVS 

>gi|30795144|gb|AAP4 1036.1 1 replicase 1 AB [SARS coronavirus Tor2] 
45 KRVCGVSA 

>gi|3 1 58 1504|gb|AAP33696.1 1 polyprotein lab [SARS coronavirus Frankfurt 1] 
ELFYSYAI 


LCEKASKY 
CEKASKYL 
EKASKYLP 
KASKYLPI 
ASKYLPID 
SKYLPIDK 
SVIDLLLN 
LLLNDFVE 
LLNDFVEI 
LNDFVEII 
NDFVEIIK 
LVDSDLNE 
VDSDLNEF 
DSDLNEFV 
SDLNEFVS 
DLNEFVSD 
LNEFVSDA 
NEFVSDAD 
EFVSDADS 
ANYIFWRK 
NYIFWRKT 
YIFWRKTN 
IFWRKTNP 
FWRKTNPI 
WRKTNPIQ 
RKTNPIQL 
KTNPIQLS 
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10 


>gi|31581504|gb|AAP33696.1| polyprotein lab [SARS coronavirus Frankfiirt 1] 

LFYSYAIH . 
>gi|3 1 58 1 504|gb| A AP33696. 1 1 polyprotein lab [SARS coronavirus Frankfurt 1] 

FYSYAIHH 

>gi|3 1 58 1504|gb|AAP33696.1 1 polyprotein lab [SARS coronavirus Frankfurt 1] 

YSYAIHHD • 
>gi|3 1 58 1 504|gb|AAP33696. 1 1 polyprotein lab [SARS coronavirus Frankfurt 1 J 

SYAIHHDK 

>gi|3 1 58 1504|gb|AAP33 696.1 1 polyprotein lab [SARS coronavirus Frankfurt 1J 
YAIHHDKF 

>gi|31581504|gb|AAP33696.1| polyprotein lab [SARS coronavirus Frankfort 1] 
AIHHDKFT 

>gi|3 1 58 1504|gb|AAP33696.1 1 polyprotein lab [SARS coronavirus Frankfort 1] 
IHHDKFTD 


15 


EXAMPLE 7: PET-SPECIFIC ANTIBODIES ARE HIGHLY SPECIFIC 
AND HAVE HIGH AFFINITY FOR THEIR PET 
ANTIGENS 

There are numerous PET-specific antibodies that were shown to be highly 
20 specific and have high affinity for their respective antigens. The following table lists 
a few exemplary antibodies showing high affinity (low nanomolar to high picomolar 
range) for their respective antigens. 


Peptide Sequence 

Length 
(aa) 

Affinity 
(K D in nM) 

Reference 

GATPEDLNQKLAGN 

14 

1.4 

Cell 91:799,1997 

CRGTGSYNRSSFESSSG 

17 

2.8 

JIM 249:253, 2001 

NYRAYATEPHAKKKS 

15 

0.5 

EJB 267: 1819,2000 

RYDIEAKVTK 

10 

3.5 

JI 169:6992,2002 

DRVYIHPF 

8 

0.5 

JIM 254: 147, 2001 

PQSDPSVEPPLS 

12 

16 (a scFv) 

NG 21: 163,2003 

YDVPDYAS (HA tag) 

8 

2 

engeneOS 

MDYKAFDN (FLAG tag) 

8 

2.3 

engeneOS 

HHHHH (HIS tag) 

5 

25 

Novagen 
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Further more, the table below shows three additional PET-specific antibodies 


with similar nanomolar-range affinity for the respective antigens: 


PET Sequence 

Ab name 

Affinity (K D in nM) 

Parental Protein 

EPAELTDA 

PI 

5 

PSA 

YEVQGEVF 

CI 

31 

CRP 

GYSIFSYA 

C2 

200 

CRP 


These PETs are selected based on the criteria set forth in the instant 
specification, including nearest neighbor analysis. Listed below are several nearest 
neighbors of two of the PETs above. 


PET LSEPAELTDAVK AA Differences 

!0 - NNP1 DEPVELTSAPTGHTFS 2 

- NNP2 AGEAAELQDAEVES SAK 2 

- NNP3 LQEPAEL VES DGVPK 3 

- NNP4 AQPAELVDSSGW 3 

- NNP5 GL DPTQLTDA LTQR 3 

15 pet YEVQGEVFTK AA Differences 

- NNP1 HVEVNGEVFQK 2 

- NNP2 SYEVLGEEFDR 2 

- NNP3 QYAVSGEIFWDR 3 

- NNP4 VYEEQGEIILK 3 
20 - NNP5 LYEVfcGETYLK 3 


PET-specific antibodies are not only high affinity antibodies, but also highly 
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specific antibodies showing little, if any cross-reactivity with other closely related 
peptide sequences. 

For example, Figure 24 shows peptide competition results using the peptide 
competition assay described in Example 5. The left panel shows that antibody PI, 

5 which is specific for the PSA-derived 8-mer PET sequence EPAELTDA, can be 
effectively competed away by the antigen PET (EPAELTDA), with a half-maximum 
effective peptide concentration of around 40 nM. However, two of its nearest- 
neighbor 8-mer PETs found in the human proteome with only two- or three-amino- 
acid differences, EPVELTSA and DPTQLTDA, are completely ineffective even at 

10 1000 |iM (25,000-fold higher concentration). Similarly, the right panel shows that 
antibody CI, which is specific for the CRP-derived 8-mer PET sequence 
YEVQGEVF, can be effectively competed away by the antigen PET sequence 
YEVQGEVF, with a half-maximum effective peptide concentration of around 1 fiM 
However, two of its nearest-neighbor 8-mer PETs found in the human proteome 

15 with only two-amino-acid differences, VEVNGEVF and YEVLGEEF, are 
completely ineffective even at 1000 (at least 1,000-fold higher concentration). 

EXAMPLE 8: ANTIBODY CROSS-REACTIVITY: KALLIKREIN Ab's 

The kallikreins are a subfamily of the serine protease enzyme family (Bhoola 

20 et ah, Pharmacol Rev 44: 1-80, 1992; Clements J. The molecular biology of the 
kallikreins and their roles in inflammation. Farmer S. eds. The kinin system 1997: 
71-97 Academic Press New York). The human kallikrein gene family was, until 
recently, thought to include only three members: KLK1, which encodes for 
pancreatic/renal kallikrein (hKl); KLK2, which encodes for human glandular 

25 kallikrein 2 (hK2); and KLK3, which encodes for prostate-specific antigen (PSA; 
hK3) (Riegman et al., Genomics 14: 6-1 1, 1992). The best known of the three classic 
human kallikreins is PSA, an important biomarker for prostate cancer diagnosis and 
monitoring. Recently, new serine proteases with high degrees of homology to the 
three classic kallikreins were cloned. These newly identified serine proteases have 

30 now been included in the expanded human kallikrein gene family. The entire human 
kallikrein gene locus on chromosome 19ql3.4 now includes 15 genes, designated 
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KLK1-KLK15; their respective proteins are known as hKl-hK15 (Diamandis et ah, 
Clin Chem 46: 1855-1858, 2000). 

KLK13, previously known as KLK-L4, is one of the newly identified 
kallikrein genes. The protein has 47% and 45% sequence identity with PSA and 

5 hK2, respectively (Yousef et al., J Biol Chem 275: 11891-11898, 2000). At the 
mRNA level, KLK13 expression is highest in the mammary gland, prostate, testis, 
and salivary glands (Yousef, supra). Although the function of KLK13 is still 
unknown, KLK13, like all other members of the human kallikrein family, is 
predicted to encode a secreted serine protease that is likely present in biological 

10 fluids. Given the prominent role of PSA as a cancer biomarker and the recent 
demonstration that other members of this gene family are also potential cancer 
biomarkers (Diamandis et al., Clin Biochem 33: 369-375, 2000; Luo et al., Clin 
Chem 47: 237-246, 2001; Diamandis et al., Clin Biochem 33: 579-583, 2000; Luo et 
al., Clin Chim Acta 7: 806-811, 2001; Diamandis et al., Cancer Res 62: 293-300, 

15 2002), hK13 may also have utility as a disease biomarker. In order to develop a 
suitable method for measuring hK13 protein in biological fluids and tissues with 
high sensitivity and specificity, and to further investigate the diagnostic and other 
clinical applications of this protein, Kapadia et al. {Clinical Chemistry 49: 77-86, 
2003) cloned and expressed the full-length recombinant human KLK13 in a yeast 

20 expression system, and raised KLK13-specific monoclonal and polyclonal 
antibodies. A sandwich-type assay revealed that the KLK13 antibody is quite 
specific - recombinant hKl, hK2, hK3, hK4, hK5, hK6, hK7, hK8, hK9, hK10, 
hKl 1, hK12, hK14, and hKl 5 proteins did not produce measurable readings, even at 
concentrations 1000-fold higher than that of hKl 3. 

25 However, it should be noted that this type of antibody specificity defined by 

cross-reactivity to other related proteins, without any epitope information, can 
frequently be misleading, and thus the data presented in Kapadia et al. should be 
interpreted with caution. For one thing, unrelated proteins may have higher sequence 
homology or conformation similarity than family proteins. It may be pure luck that 

30 any hKl 3 antibody does not cross-react with other highly related family members. 
However, there is no guarantee that the specific epitope recognized by the hKl 3 
antibody does not appear in other proteins, such as an un-identified kallikrein family 


-182- 


WO 2005/078453 


PCT/US2005/003634 


member, or an alternative splicing form of hKl 3. Therefore, antibody specificity is 
better defined by reactivity to peptides most homologous to a selected PET (nearest 
neighbor peptides). Antibody cross-reactivity is now readily measurable using 
peptide competitive assays at a wide dynamic range. 

5 On the other hand, in certain situations, detection for the whole protein 

family or a specific subset of the family are needed. For example, it has already been 
demonstrated that multiple kallikreins are overexpressed in ovarian carcinoma 
(reviewed in Yousef and Diamandis, Minerva Endocrinol 27: 157-166, 2002). There 
is experimental evidence that these kallikreins may form a cascade enzymatic 

10 pathway similar to the pathways of coagulation and fibrinolysis. Therefore, one 
single antibody specific for the subset of ovarian carcinoma-associated kallikreins is 
of particular interest in clinical setting. Lastly, the concentrations of competitors 
used is limited in Kapadia's assay. 

These problems can be readily tackled with the approach of the instant 

15 invention. For example, the table below lists a common PET for hKl -hKl 1 (except 
hK6 and 7, which have their common PETs), as well as PETs specific for each hK 
proteins listed. In addition, both the family-specific PET and the protein-specific 
PET are within the same tryptic fragment. 

H SQPWQVAVYSHGWAH CGGVLVHR 
IVGGWECEQH SQPWQAALYHFSTFQ CGGILVHK 
G SQPWQV S LFNGLSFH CAGVLVDR 
N SQPWQV G LFEGTSLR 
HECQPH SQPWQAALFQGQQLL CGGVLVGR 
BDCSPH SQPWQAALVMENELF CSGVLVHR 
V LNTNGTSGF LPGGYTCFPH SQPWQA ALLVQGR 
LL EGPECAPHSQPWQV ALYER 

PNSQPWQAGLFHLTR 


20 hKl 
hK2 
hK3 
hK4 
hK5 

25 hK8 
hK9 
hK10 
hKll 
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hK6 CVTAGTSCLISGWGSTSSPQLR 
Hk7 VMDL P TQEP ALGTT CY AS GWGS I EPEEFLTPK 

By using these family- and individual-specific PET antibodies (or other 
5 suitable capture reagents), the same tryptic digestion can be used for a sandwich- 
type assay that captures all interested tryptic peptides (using the family-specific PET 
antibodies), followed by selective detection / quantitation of specific family 
members (using for example, differentially labeled individual-specific antibodies, 
preferably in a single experiment. 

10 In addition, the same approach may be used to detect the presence of 

alternative splicing isoforms of any protein. For example, there are three alternative 
splicing forms of hK15 (* represents trypsin digestion sites): 

hK15-Vl 

R*LNPQVR*PAVLPTR*CPHPGEACWSGWGLVSHEPGTAGSPR*SQG 

15 hK15-V2 R*LNPQ 

hK15-V3 R*LNPQGDSGGPLVCGGILQGIVS WGDVPCDNT TK*PGVYTK 

Thus, SGWGLVSH is a PET for detecting VI, with the three nearest 
neighbor peptides being AGWGIVNH, SGWGITNH, and SGWGMVJE. Similarly, 
WGDVPCDN is a PET for detecting VI, with the three nearest neighbor peptides 
20 being WKDVPCED, WNDAPCDS, and WNDAPCDK. 

EXAMPLE 9: DETECTING SERUM PROTEIN LEVELS 

Due to the fundamental problems in measuring an antigen which exists in 
more than one form and/or present in different complexes, it may be difficult to 
25 reach a consensus on the level of total a serum protein (such as TGF-bl protein) in 
normal human plasma. The instant invention provides a method that efficiently 
solves these problems. 

Figure 21 shows a design for the PET-based assay for standardized serum 
TGF-beta measurement. The C-terminal monomer for the mature TGF-beta is 
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represented in the top panel as a red bar. The sequences below indicates the PETs 
specific for each of the 4 TGF-beta isoforms and their respective nearest neighbors. 
The PET-based assay can be used to specifically detect one of the TGF-beta 
isoforms, as well as the total amount of all TGF-beta isoforms present in a serum 
5 sample. 

Generally, the nomenclature used herein and the laboratory procedures 
utilized in the present invention include molecular, biochemical, microbiological and 
recombinant DNA techniques. Such techniques are thoroughly explained in the 

10 literature. See, for example, "Molecular Cloning: A laboratory Manual" Sambrook 
et aL, (1989); "Current Protocols in Molecular Biology" Volumes I-III Ausubel, R. 
M., ed. (1994); Ausubel et al., "Current Protocols in Molecular Biology", John 
Wiley and Sons, Baltimore, Md. (1989); Perbal, "A Practical Guide to Molecular 
Cloning", John Wiley & Sons, New York (1988); Watson et al., "Recombinant 

15 DNA", Scientific American Books, New York; Bin-en et al. (eds) "Genome 
Analysis: A Laboratory Manual Series", Vols. 1-4, Cold Spring Harbor Laboratory 
Press, New York (1998); methodologies as set forth in U.S. Pat. Nos. 4,666,828; 
4,683,202; 4,801,531; 5,192,659 and 5,272,057; "Cell Biology: A Laboratory 
Handbook", Volumes I-III Cellis, J. E., ed. (1994); "Current Protocols in 

20 Immunology" Volumes I-III Coligan J. E., ed. (1994); Stites et al. (eds), "Basic and 
Clinical Immunology" (8th Edition), Appleton & Lange, Norwalk, CT (1994); 
Mishell and Shiigi (eds), "Selected Methods in Cellular Immunology", W. H. 
Freeman and Co., New York (1980); available immunoassays are extensively 
described in the patent and scientific literature, see, for example, U.S. Pat. Nos. 

25 3,791,932; 3,839,153; 3,850,752; 3,850,578; 3,853,987; 3,867,517; 3,879,262; 
3,901,654; 3,935,074; 3,984,533; 3,996,345; 4,034,074; 4,098,876; 4,879,219; 
5,011,771 and 5,281,521; "Oligonucleotide Synthesis" Gait, M. J., ed. (1984); 
"Nucleic Acid Hybridization" Hames, B. D., and Higgins S. J., eds. (1985); 
"Transcription and Translation" Hames, B. D., and Higgins S. J., eds. (1984); 

30 "Animal Cell Culture" Freshney, R. I., ed. (1986); "Immobilized Cells and 
Enzymes" IRL Press, (1986); "A Practical Guide to Molecular Cloning" Perbal, B., 
(1984) and "Methods in Enzymology" Vol. 1-317, Academic Press; "PCR 
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Protocols: A Guide To Methods And Applications", Academic Press, San Diego, 
Calif. (1990); Marshak et al., "Strategies for Protein Purification and 
Characterization— A Laboratory Course Manual" CSHL Press (1996); all of which 
are incorporated by reference as if fully set forth herein. Other general references are 
5 provided throughout this document. The procedures therein are believed to be well 
known in the art and are provided for the convenience of the reader. All the 
information contained therein is incorporated herein by reference. 

Equivalents 

Those skilled in the art will recognize, or be able to ascertain using no more 
10 than routine experimentation, many equivalents to the specific embodiments of the 
invention described herein. Such equivalents are intended to be encompassed by the 
following claims. 
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Claims: 

1. A method for achieving high sensitivity detection and/or high accuracy 
quantitation of a target protein in a biological sample, comprising: 

(1 ) providing two or more different capture agents for detecting a target 
5 protein in a test sample, which capture agents are provided as an 

addressable array, and each of which capture agents selectively 
interacts with a peptide epitope tag (PET) of said target protein; 

(2) contacting said array with a solution of polypeptide analytes 
produced by denaturation and/or cleavage of proteins from the test 

10 sample; 

(3) detecting the presence and amount of said target protein in the sample 
from the interaction of said polypeptide analytes with each said 
capture agents; 

(4) quantitating, if present, the amount of the target protein in the sample 
j 5 by averaging the results obtained from each said capture agents in 

(3). 

2. The method of claim 1, wherein each said different capture agents 
specifically bind a different PET of said target protein. 

3. The method of claim 2, wherein said different capture agents belong to the 
20 same category of capture agent. 

4. The method of claim 3, wherein said category of capture agent includes: 
antibody, non-antibody polypeptide, PNA (peptide nucleic acids), scaffolded 
peptide, peptidomimetic compound, polynucleotide, carbohydrates, artificial 
polymers, plastibody, chimeric binding agnet derived from low-affinity 

25 ligand, and small organic molecules. 

5. The method of claim 2, wherein at least two of said different capture agents 
belong to different categorys of capture agent selected from antibody, non- 
antibody polypeptide, PNA (peptide nucleic acids), scaffolded peptide, 
peptidomimetic compound, polynucleotide, carbohydrates, artificial 

30 polymers, plastibody, chimeric binding agnet derived from low-affinity 

ligand, and small organic molecules. 
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6. The method of claim 1, wherein a subset of said capture agents bind to the 
same PET, and wherein each capture agents of said subset belong to 
different category of capture agent selected from: antibody, non-antibody 
polypeptide, PNA (peptide nucleic acids), scaffolded peptide, 

5 peptidomimetic compound, polynucleotide, carbohydrates, artificial 

polymers, plastibody, chimeric binding agnet derived from low-affinity 
ligand, and small organic molecules. 

7. The method of claim 1 , wherein said target protein has two or more different 
forms within said biological sample. 

10 8. The method of claim 7, wherein said different forms include unprocessed / 
pro-form and processed / mature form. 

9. The method of claim 7, wherein said different forms include different 
alternative splicing forms. 

10. The method of claim 7, wherein said different forms include unmodified and 
15 post-translationally modified form with respect to one or more post- 
radiational modification(s). 

11. The method of claim 10, wherein said post-translational modification 
includes: acetylation, amidation, deamidation, prenylation, formylation, 
glycosylation, hydroxylation, methylation, myristoylation, phosphorylation, 

20 ubiquitination, ribosylation and sulphation. 

12. The method of claim 7, wherein a subset of said capture agents are specific 
for PET(s) only found in certain forms but not in other forms. 

13. The method of claim 12, further comprising determining the percentage of 
one form of said target protein as compared to the total target protein, or 

25 ratio of a first form of said target protein to a second form of said target 

protein. 

14. The method of claim 1, further comprising detecting other target proteins 
within said biological sample with capture agents specific for PETs of said 
other target proteins. 

30 15. The method of claim 14, wherein two or more different capture agents are 
used for detecting and/or quantitating at least one of said other target 
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proteins. 

16. The method of claim 1, wherein, for each capture agent, the method has a 
regression coefficient (R 2 ) of 0.95 or greater. 

17. The method of claim 1, wherein the array has a recovery rate of at least 50 
5 percent. 

18. The method of claim 1, wherein the accuracy is 90%. 

19. The method of claim 1, wherein said sample is a body fluid selected from: 
saliva, mucous, sweat, whole blood, serum, urine, amniotic fluid, genital 
fluid, fecal material, marrow, plasma, spinal fluid, pericardial fluid, gastric 

10 fluid, abdominal fluid, peritoneal fluid, pleural fluid, synovial fluid, cyst 

fluid, cerebrospinal fluid, lung lavage fluid, lymphatic fluid, tears, prostatic 
fluid, extraction from other body parts, or secretion from other glands; or 
from supernatant, whole cell lysate, or cell fraction obtained by lysis and 
fractionation of cellular material, extract or fraction of cells obtained directly 

1 5 from a biological entity or cells grown in an artificial environment. 

20. The method of claim 1, wherein said sample is obtained from human, 
mouse, rat, frog (Xenopus), fish (zebra fish), fly (Drosophila melanogaster), 
nematode (C. elegans), fission or budding yeast, or plant {Arabidopsis 
thaliana). 

20 21. The method of claim 1, wherein said sample is produced by treatment of 
membrane bound proteins. 

22. The method of claim 1, wherein step (3) is effectuated by directly detecting 
and measuring captured PET-containing polypeptides using mass 
spectrometry, colorimetric resonant reflection using a SWS or SRVD 

25 biosensor, surface plasmon resonance (SPR), interferometry, gravimetry, 

ellipsometry, an evanascent wave device, resonance light scattering, 
reflectometry, a fluorescent polymer superquenching-based bioassay, or 
arrays of nanosensors comprising nanowires or nanotubes. 

23. The method of claim 1, wherein step (3) is effectuated by using secondary 
30 capture agents specific for captured polypeptide analytes, wherein said 

secondary capture agent is labeled by a detectable moiety selected from: an 
enzyme, a fluorescent label, a stainable dye, a chemilumninescent 
compound, a colloidal particle, a radioactive isotope, a near-infrared dye, a 
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DNA dendrimer, a water-soluble quantum dot, a latex bead, a selenium 
particle, or a europium nanoparticle. 

24. The method of claim 23, wherein said secondary capture agent is specific for 
a post-translational modification. 

5 25. The method of claim 24, wherein said secondary capture agent is a labeled 
secondary antibody specific for phosphorylated tyrosine, phosphorylated 
serine, or phosphorylated threonine. 

26. The method of claim 1, wherein said sample contains billion molar excess of 
unrelated proteins or fragments thereof relative to said target protein. 

10 27. The method of claim 1, wherein said PET is identified based on one or more 
of the protein sources selected from: sequenced genome or virtually 
translated proteome, virtually translated transcriptome, or mass spectrometry 
database of tryptic fragments. 

28. The method of claim 1, wherein the target protein is a biomarker with a 
15 concentration of about 1-5 pM in said sample. 

29. The method of claim 1, wherein the target protein is a biomarker with 
relatively samll concentration change of no more than 50%, 40%, 30%, 
20%, 10%, 5%, or 1% in a disease sample. 

30. An array of capture agents for detecting and quantitating a target protein 
20 within a biological sample, comprising a plurality of capture agents, each 

immobilized on a distinct addressable location on solid support, each of said 
capture agents specifically binds a PET uniquely associated with a peptide 
fragment of said target protein that predictably results from a treatment of 
said biological sample. 

25 31. The array of claim 30, wherein said solid support is beads or an array device 
in a manner that encodes the identity of said capture agents disposed 
thereon. 

32. The array of claim 29, wherein said array includes 2 - 100 or more different 
capture agents. 

30 33. The array of claim 29, wherein said array device includes a diffractive 
grating surface. 

34. The array of claim 29, wherein said capture agents are antibodies or antigen 
binding portions thereof, and said array is an arrayed ELISA. 
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35. The array of claim 29, wherein said array device is a surface plasmon 
resonance array. 

36. The array of claim 29, wherein said beads are encoded as a virtual array. 

37. A composition comprising a plurality of capture agents, wherein each of said 
5 capture agents recognizes and interacts with one PET of a target protein. 

38. The composition of claim 37, wherein said capture agents is independently 
selected from: antibody, non-antibody polypeptide, PNA (peptide nucleic 
acids), scaffolded peptide, peptidomimetic compound, polynucleotide, 
carbohydrates, artificial polymers, plastibody, chimeric binding agnet 

1 o derived from low-affinity ligand, and small organic molecules. 

39. The composition of claim 38, wherein said capture agents are antibodies, or 
antigen binding fragments thereof. 

40. The composition of claim 39, wherein said capture agent is a full-length 
antibody, or a functional antibody fragment selected from: an Fab fragment,. 

15 an F(ab') 2 fragment, an Fd fragment, an Fv fragment, a dAb fragment, an 

isolated complementarity determining region (CDR), a single chain antibody 
(scFv), or derivative thereof. 

41 . The composition of claim 39, wherein each of said capture agents is a single 
chain antibody. 
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Figure 4 


Chemokine Receptor CXCR4 Western 




M M H-S H-P M-S M-P 



M: Protein Size Marker 
H-S: HELA-Supernatant 
H-P: HELA-Pellet 
M-S: MOLT4-Supernatant 
M-P: MOLT4-Pellet 


1 ) Cells are washed In PBS 

2) Cells are suspended (5x1 0 6 
cells/ml) in a buffer with 0.5% 
Triton X-1 00 

3) Cells are homogenized in a 
Dounce homogenizer (30 
strokes) 

4) Centrifuge and load the soluble 
and pellet to the gel 


>90% of total proteins are solublized 
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Figure 5 


NH2 PET 


COOH 


■COONH2 COONH> 

i 1 


Blocked 


Blocked v 


KLH 


Rabbits or Mice 

j 

Standard Ab Generation Procedure 


Parental Tryptic Fragment 


Use this to affinity purify antibodies 
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Figure 6 


Denature 
Reducing 
Alkylation 


Protein Samples 
Protein Extraction/Dilution 

Thermal Denaturation 



Trypsin Digestion 


Trypsin Digestion 


Desalting 


Peptide Assay 
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Figure 7 


100ul Human Serum (75mg/ml) 


Human Serum Protein (4mg/ml) Dilute 1 0X to 7.5mg/ml 
In 1.8ml of 50mM HEPES buffer 
(pH 8.0), 8M urea and 10mM DTT 


Add iodoacetamide at 25mM qqoq f or 5 minutes 

I I 

Dilute to 1 mg/ml and digest Digest for 30 minutes at 55°C 

I I 

Desalting columns Inactivate trypsin at 80°C 
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Figure 8 



12345 678910 
M S S-T S-TC M M-T M-C H H-T H-C 
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Figure 10 


A kinase (of a family of kinases) with known 
phosphorylation consensus 

AKT or protein kinase B phosphorylates Ser or Thr in 
the motif RXRXXSH" 


Identify all human sequences 
with RxRxxS/T motif 


Define tryptic fragment for each 
protein containing the motif 


Define PETs and 
raise antibodies 


Anti-AKT Motif Antibody 


PET antibody 
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Figure 1 1 


A Sequenced Genome 

I 

Predicted Protein Sequences 


I 


Parse Protein Sequences Into 
Overlapping Peptides of 4-10 
Amino Acids In Length 


All Possible Peptide Tags 
of N Amino Acids 


20" (N=4-10) 
'eptide Tag Database 


Peptide Occurrence Database 
1 PETs 
>1 Non-PETs 


A protein of X amino acids gives 
(X-N+1) 

overlapping peptides of N amino acids 
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Figure 12 


Tag Length All Possible Tags 
(amino acids) 


4 

5 
6 
7 
8 
9 
10 


160,000 
3.2x1 0 6 
6.4x1 0 7 
1.28x10 9 
2.56x1 0 10 
5.12x10 11 
1.02x10 13 


29,076 human protein sequences analyzed 
~12M overlapping 4-1 Omr peptides 


Total PETS 


745 

560,309 

4,609,172 

6,652,224 

7,018,340 

7,138,933 

7,216,090 


Total Non-PETs 
(non-redundant) 

158,862 

1,684,684 

2,350,532 

1,848,908 

1,744,029 

1,714,971 

1,695,512 
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Figure 13 


Length of PET (amino acids) Proteins with PETs Proteome Coverage 


4 
5 
6 
7 
8 
9 
10 


300 
250 

Average 
PET per 200 
Tagged 150 
Protein 

100 

50 
0 


684 

23,446 

26,069 

26,184 

26,216 

26,238 

26,250 


2.35% 
80.64% 
89.66% 
90.05% 
90.16% 
90.24% 
90.28% 



S 6 7 8 9 

PET Length (amino acids) 


10 
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PET Length Average PETs/Tagged Protein 

(Amino Acids) No Cleavage Trypsin Cleavage 

5 24 16 

6 177 98 
8 268 129 



1 21 41 61 81 101 121 141 161 181 20 ^ ^ 2 1 31 41 51 61 71 81 91 101 

Fragment Number Fragment Size (am ino acids) 
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Figure 15 
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CoocentrAtloo of Specific Target 
Peptide 1dM| 


-Buffer Control 


-Digested Total 
Hunun Protein 
Baract 

Specific Target 
Peptide 


- Specific Target 
Pep«ide + Digested 
Total Human Protein 
Extract 


Fluorcvcent Sandwich Peptide As%ay 


Target Peptide 

+' 

Trypsin Digested Human 
Serum Proteins (10mM) 


ft n 
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FLAG Tag MYC Tag 
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protein concentration (nM) 



OnM 1nM 5nM 10nM 15nM 2SnM 50nM lOOnM 
peptide concentration (nM) 


PSA Protein Sandwich Assay 


HA-HIS Peptide Sandwich Assay 
-*• 

lfA ulo Tr yPSin HA-HIS 
PSA HA-HIS _ + 


(PSA secreted in 
conditioned media) 
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Figure 1 7 


SHIP-2 


NSFNNPApYYVLEGVPHQLLPPEPPpSPAR 



Detect pY using 
Anti-phosphoTyrosine 
Ab as secondary detection 


NNP1 YLTEGVPH 
NNP2 YILESMPH 
NNP3 YVIMGMPH 


Detect pS using 
Anti-phosphoSerine 
Ab as secondary detection 


ABL 


LGGGQpYGEVYEGVWK 

I 

PET-Ab 


k NNP1 EVYVGVWK 
NNP2 EVFEGLWK 
NNP3 EVYEGVYT 


Detect pY using Anti-phosphoTyrosine 
Ab as secondary detection 
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Figure 18 



Top 20 Common hexamers 
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Figure 19 


Prrvtein 

1 1 UICI 1 1 

Parental Trvptic Peptide 

Also Detect 

BRAF 

WSGSHQFEQLSGSILWMAPEVIK W 


DLK 

MSFAGTVAWMAPEV I R 


GCK 

SFIGTPYWMAPEVAAVbK 


HH4 98 

WMAFEVr iUulK 


HPK1 
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Figure 23 
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